populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP

MIT License

5 stars 4 forks source link

Clarify dataset in in-module use of output_path #326

Closed MattWellie closed 6 months ago

MattWellie commented 6 months ago

Fixes

When multiple datasets are run concurrently in the pipeline, they are all checkpointing to the same tmp location in cpg-seqr-main-tmp due to this use of output_path
This is the only use of output_path I can find (other than in interpretation_runner.py, which is a completely different entrypoint, unused by prod-pipes

Proposed Changes

pass this specific dataset name into the hail labelling stage (optional, defaults to workflow.dataset)
writes temp files and checkpoints to a dataset-specific path