lisad / phaser

The missing layer for complex data batch integration pipelines
MIT License
9 stars 1 forks source link

Sort out how Pipeline gets instantiated in run/diff, with or without phases passed in #159

Open lisad opened 2 months ago

lisad commented 2 months ago

The way we instantiate the Pipeline inside 'run' naturally doesn't pass in a bunch of phases - we expect the Pipeline-sub-class that we're pulling out of a file to have phases.

However, there's also a way to pass the phases into a Pipeline during instantiation. This seemed like a good feature initially, for a couple reasons. It's useful for testing - it lets us run tests on the BASE class without extra steps in the tests to define a subclass then instantiate it. (This is certainly not the only way to make our tests easy to write, however, we could always use factories. ) It's also useful for driving pipelines and phases in a shell - e.g. a developer is trying to debug a given Phase and they can run it in an ad-hoc Pipeline without modifying the full pipeline.

If we leave things as-is, it seems like a foot-gun: a developer will add phases to a pipeline during instantiation and run it to test it, forget to add the phases to the class, then when running it via the CLI it won't work the same way and it will be non-obvious why. I feel like we could fix this in a couple ways. (1) Remove the phases passed into instantiation and make tests easy to write another way. (2) Continue doing this, but also have some good error messages to explain what's missing when running in the CLI?