We should create run-id subdirectories inside the working directory

lisad / phaser

The missing layer for complex data batch integration pipelines

MIT License

9 stars 1 forks source link

We should create run-id subdirectories inside the working directory #97

Closed lisad closed 6 months ago

lisad commented 7 months ago

This was something we built before that was useful and I forgot until trying to actually run phaser from the CL and look at output: we should create subdirectories with run data

So if the command is

python -m phaser run employees ~/employee_data fixture_files/employees.csv

then within '~/employee_data' we should put a subdirectory each time the pipeline is run so that each run doesn't overwrite the previous run

lisad commented 7 months ago

In the working directory itself we'd expect to see the latest output. In the run directory, we'd expect to see the output of that run, the error log and checkpoints.

If we extend to having phases or capabilities of some kind to fetch source data, we'd also put a copy of source data in the run dir.

jeffkole commented 6 months ago

Closed by #117