change input bam names of RunLOHHLA/QcQualimap/CollectHsMetrics

anoronh4 commented 3 years ago

Recently we ran into a situation where input bams were not generated by Tempo and the ID was s_C_000184_T002_d but the basename of the bam was s_C_000184_T002_d___bqsr.bam, causing failures or recognition of an incorrect sample name (s_C_000184_T002_d___bqsr). We had failures in RunLOHHLA. Multiqc processes showed incorrect sample names because it would parse the path of the bam from the input picard and qualimap files.

LOHHLA does not have an option to enforce the ids we want. instead we can change the following line https://github.com/mskcc/tempo/blob/master/pipeline.nf#L1691 to:

set idNormal, target, idTumor, file("${idTumor}.bam"), file("${idTumor}.bam.bai"), file("${idNormal}.bam"), file("${idNormal}.bam.bai"), file(purityOut), placeHolder, file(winnersHla) from mergedChannelLOHHLA

multiqc uses regex to find and extract from the bam input of the hs_metrics file: https://github.com/ewels/MultiQC/blob/master/multiqc/modules/picard/HsMetrics.py#L78-L82 same thing with qualimap: https://github.com/ewels/MultiQC/blob/master/multiqc/modules/qualimap/QM_BamQC.py#L109

i think it makes sense to run each of the upstream modules with corrected bam names. an alternative would be to generate a clean temporary version of each input file at the multiqc process, which would work but not be as simple to implement.

gongyixiao commented 2 years ago

I don't think changing the lines as you indicated will fix the problem. My idea will be have a rename command before running lohhla script in RunLOHHLA process to rename the bam file from whatever they are to ${idTumor}.bam etc. Happy to discuss.

gongyixiao commented 1 year ago

This might be a solution? https://github.com/mskcc/tempo/issues/773

mskcc / tempo

change input bam names of RunLOHHLA/QcQualimap/CollectHsMetrics #892