Open allanbolipata opened 4 years ago
Do you know if bwa_output
is the same for both of these pair objects?
@nikhil bwa_output
is set here: https://github.com/mskcc/beagle/blob/develop/runner/operator/roslin_operator/construct_roslin_pair.py#L22-L38
bwa_output
is set by taking the sample ID/SM
and appending .bam
. Sample SM
is set through the "cmo sample name" https://github.com/mskcc/beagle/blob/develop/runner/operator/roslin_operator/bin/make_sample.py#L117
It shouldn't be the same for a pair of samples (i.e., a tumor and a normal); that said, a sample could have multiple fastqs and I think they will overwrite because the trimming step doesn't take the chunking into account.
I will confirm with @Tim if we still need clstats delivered. If so, removing clstats from pipeline output will solve the issue
/path/to/normal/1/file_r1.fq
-->file_r1.trimgalore.ext
/path/to/normal/2/file_r1.fq
-->file_r1.trimgalore.ext
The quick fix is to rename these files and enforce a strict policy to have unique file names that includes perhaps lane information. The longer fix is to handle that in the pipeline somehow