umpirsky / Extraload

Powerful ETL library.
MIT License
34 stars 5 forks source link

Pipeline is insufficient for ETL workflows #1

Open jeremygiberson opened 8 years ago

jeremygiberson commented 8 years ago

Many ETL processes generally have more complex workflows than a simple pipeline. There are conditional branches, split/merge branches, success/error branches, etc. So you probably need a node / graph based data structure to represent the ETL workflow.

jeremygiberson commented 8 years ago

Additionally, a lot of times you want to traverse the ETL workflow a row at a time. Other times you want to wait at a specific step until you've iterated through all the rows from the previous steps before your proceed to the next step.

umpirsky commented 8 years ago

@jeremygiberson Sure, I just wanted to implement default pipeline, or few defaults, and leave end users possibility to implement their own. It also serves as an example how to do it, but still does not limit users in any way.

Thanks for sharing, really appreciate your feedback. :+1: