mskcc / tempo

CCS research pipeline to process WES and WGS TN pairs
https://cmotempo.netlify.com/
12 stars 5 forks source link

Aggregate from Path will fail in certain aggregation process when certain samples have non-complete subworkflow #1002

Open gongyixiao opened 8 months ago

gongyixiao commented 8 months ago

Below lines of code/script will fail when doing aggregate from path, and have samples haven't finished corresponding subworkflows. The goal of aggregate from path is to leverage whatever have already completed samples/subworkflows to construct a cohort to the best of it can. It was designed to tolerate missing samples/subworkflows and aggregate whatever there is. Since below scripts specifically naming files, instead of using wildcard like the examples at the end here shows, the process will fail since aggrgate will create invalid link based on the path, which will be ignored by using wildcard but error will raise when specifying file names.

https://github.com/mskcc/tempo/blob/e29f5bed924bb055f33bc0dc10a8a7319df18096/modules/process/Aggregate/GermlineAggregateSv.nf#L14 https://github.com/mskcc/tempo/blob/e29f5bed924bb055f33bc0dc10a8a7319df18096/modules/process/Aggregate/SomaticAggregateHRDetect.nf#L13 https://github.com/mskcc/tempo/blob/e29f5bed924bb055f33bc0dc10a8a7319df18096/modules/process/Aggregate/SomaticAggregateSv.nf#L14

And for loops in the script block of this process https://github.com/mskcc/tempo/blob/develop/modules/process/Aggregate/CohortRunMultiQC.nf

Successful examples: https://github.com/mskcc/tempo/blob/e29f5bed924bb055f33bc0dc10a8a7319df18096/modules/process/Aggregate/SomaticAggregateMaf.nf#L21 https://github.com/mskcc/tempo/blob/e29f5bed924bb055f33bc0dc10a8a7319df18096/modules/process/Aggregate/SomaticAggregateLOHHLA.nf#L20