naobservatory / mgs-pipeline

MIT License
3 stars 1 forks source link

Pipeline: Allow parallelizing at the sample level #54

Closed jeffkaufman closed 9 months ago

jeffkaufman commented 10 months ago

Add a --sample-level option to ./reprocess-bioprojects.py, so we can run commands like:

./reprocess-bioprojects.py --bioprojects PRJNA729801 --max-jobs 3 --log-prefix al --sample-level -- --stages alignments

Without this we'd only be running one job at a time, because reprocess-bioprojects.py parallelizes at the bioproject level. Instead, we'll run three jobs at a time, one for each sample.

This is an improvement for now, but long term if we stick with this system I'd prefer not to have the operator need to think about this sort of thing. Added a comment to reprocess-bioprojects.py with more.

jeffkaufman commented 9 months ago

ping @lennijusten

lennijusten commented 9 months ago

ping @lennijusten

I will try to get to this by tomorrow.