When multiple datasets are run concurrently in the pipeline, they are all checkpointing to the same tmp location in cpg-seqr-main-tmp due to this use of output_path
This is the only use of output_path I can find (other than in interpretation_runner.py, which is a completely different entrypoint, unused by prod-pipes
Proposed Changes
pass this specific dataset name into the hail labelling stage (optional, defaults to workflow.dataset)
writes temp files and checkpoints to a dataset-specific path
Fixes
cpg-seqr-main-tmp
due to this use ofoutput_path
interpretation_runner.py
, which is a completely different entrypoint, unused by prod-pipesProposed Changes