mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.07k stars 100 forks source link

Optimistic pipeline execution behavior #96

Open leo-schick opened 1 year ago

leo-schick commented 1 year ago

When a node in a pipeline fails, the whole pipeline fails. It would be great to have a more optimistic execution: When a node in a pipeline fails, just skip the downstream nodes instead of all the open nodes and the mark the pipeline as “failed”. This would match the execution logic like dbt does it and gives the data engineer the opinion to just fix the missing nodes after failure. Current I most of the time have to restart the whole pipeline again even some small tasks at the start which are not connected to other tasks fail.

jankatins commented 1 year ago

I have missed such behavior sometimes, but more because I didn't want to rerun stuff after fixing a bug in a dependency node. That would imply that optimistic runs are actually not so important, but more some kind of run unrun nodes (from last pipeline run) in pipeline or even run unrun nodes (from last pipeline run) up to this node. And then you need something which can do rerun that successfully run node, because you changed the code, to that other node, but only everything which has not run or is depending on that node you changed. That would probably be quite an interesting problem... :-(

In the end I solved such cases mostly manual, at least when it was easily possible (just two directly connected nodes)

leo-schick commented 1 year ago

100% agree. I made a separate ticket to add a re-run pipeline function: #97