Open anoronh4 opened 3 years ago
I don't think changing the lines as you indicated will fix the problem. My idea will be have a rename command before running lohhla script in RunLOHHLA
process to rename the bam file from whatever they are to ${idTumor}.bam
etc. Happy to discuss.
This might be a solution? https://github.com/mskcc/tempo/issues/773
Recently we ran into a situation where input bams were not generated by Tempo and the ID was
s_C_000184_T002_d
but the basename of the bam wass_C_000184_T002_d___bqsr.bam
, causing failures or recognition of an incorrect sample name (s_C_000184_T002_d___bqsr
). We had failures inRunLOHHLA
. Multiqc processes showed incorrect sample names because it would parse the path of the bam from the input picard and qualimap files.LOHHLA does not have an option to enforce the ids we want. instead we can change the following line https://github.com/mskcc/tempo/blob/master/pipeline.nf#L1691 to:
multiqc uses regex to find and extract from the bam input of the hs_metrics file: https://github.com/ewels/MultiQC/blob/master/multiqc/modules/picard/HsMetrics.py#L78-L82 same thing with qualimap: https://github.com/ewels/MultiQC/blob/master/multiqc/modules/qualimap/QM_BamQC.py#L109
i think it makes sense to run each of the upstream modules with corrected bam names. an alternative would be to generate a clean temporary version of each input file at the multiqc process, which would work but not be as simple to implement.