lisad / phaser

library for batch-oriented complex data integration pipelines
MIT License
3 stars 1 forks source link

Add some kind of command line control over running just one phase? #121

Open lisad opened 1 month ago

lisad commented 1 month ago

I started working on this but then wasn't sure how to do it.

I'm thinking of syntax something like python3 -m phaser run boston output --phase aggregate-counts sources/bike_ped_counts.csv

or

`python3 -m phaser run boston output --phases [1,2] sources/bike_ped_counts.csv'

It's powerful when working on just one phase, to run just that phase and get to ignore output from other phases. But I'm not sure whether to name a phase and run everything up to that phase, or to name a phase and expect the prior input to already be there; also I think it might be easier to provide numbers so that "--phases [1,2]" means to run the first and 2nd phases of the pipeline (definitely easier to type and remember than the phase name, which is different from the phase class name)

jeffkole commented 1 month ago

MVP: run a single phase, expecting all necessary sources to be in the working directory already

If it is easy to extend to run any phases, then do that.