mskcc / roslin-cwl

CWLs for the Roslin Variant Pipeline
0 stars 1 forks source link

In the trimming step of sample-workflow.cwl, file outputs only go by the file name; if they are the same across different paths, they are overwritten #20

Open allanbolipata opened 4 years ago

allanbolipata commented 4 years ago

/path/to/normal/1/file_r1.fq --> file_r1.trimgalore.ext /path/to/normal/2/file_r1.fq --> file_r1.trimgalore.ext

The quick fix is to rename these files and enforce a strict policy to have unique file names that includes perhaps lane information. The longer fix is to handle that in the pipeline somehow

nikhil commented 4 years ago

Do you know if bwa_output is the same for both of these pair objects?

allanbolipata commented 4 years ago

@nikhil bwa_output is set here: https://github.com/mskcc/beagle/blob/develop/runner/operator/roslin_operator/construct_roslin_pair.py#L22-L38

bwa_output is set by taking the sample ID/SM and appending .bam. Sample SM is set through the "cmo sample name" https://github.com/mskcc/beagle/blob/develop/runner/operator/roslin_operator/bin/make_sample.py#L117

It shouldn't be the same for a pair of samples (i.e., a tumor and a normal); that said, a sample could have multiple fastqs and I think they will overwrite because the trimming step doesn't take the chunking into account.

nikhil commented 4 years ago

I will confirm with @Tim if we still need clstats delivered. If so, removing clstats from pipeline output will solve the issue