simonsobs / BBPipe

B-modes pipeline constructor
5 stars 6 forks source link

More reproducibility #30

Open msyriac opened 3 years ago

msyriac commented 3 years ago

Say that I run:

bbpipe test.yml

which has stages stage1 and stage2, and config file config.yml. Currently, it is not obvious to me that the information in test.yml and config.yml is fully saved for future reference. Also, the outputs from the two stages end up in the same directory by default, which makes it difficult to e.g. sync just some of the expensive stages to a different compute system. Here's a proposal for a slight restructuring.

output_dir will be the root directory. Pipeline products will be written to sub-directories in output_dir with the same name as the stages. log_dir does not have to be specified. Instead, it is always written to a file $output_dir/run_{TIME}/log.txt, where TIME is some identifier for the time bbpipe was run. Similarly, test.yml and config.yml are copied into $output_dir/run_{TIME}/. This will save more info about the submission. Let me know what you think and I can submit a PR.

msyriac commented 3 years ago

A related request: can we eliminate the need for SCRATCH (e.g. on cori) and make it so that the parsl logs go into the $output_dir/run_{TIME} directory? It's nice to have all related logs in the same place. I know on some systems this requires making sure output_dir is writeable by compute nodes, but that can be done by the corresponding site module.

damonge commented 3 years ago

Sounds good to me! Sorry for the delay. Once the PR is there it'd be good to run one of our existing pipelines (e.g. BBPower) to check there are no uforeseen effects.

damonge commented 3 years ago

I might even suggest writing a mini pipeline to test this (which we could use to create a few unit tests, which we need anyway). I'll try to give this a go tomorrow.

msyriac commented 3 years ago

Great! I'm inferring from your response that you are ok with all the suggestions I've made, so I'll update PR #31 to do:

  1. separate run directories and copy yamls (done)
  2. stage outputs in sub-directories (done)
  3. ~make SCRATCH unnecessary~ (won't do in PR #31 )

I'd also like to understand why BBPIPE_SETUP is needed. I might see if I can make that redundant too.

Mini-pipeline test sounds great!

msyriac commented 3 years ago

Right, ok, so why exactly are things like BBPIPE_SETUP and BBPIPE_SCRIPT_DIR used? Why can't we assume bbpipe is in the path?