Closed ojziff closed 2 years ago
Need to discuss this internally in the nf-core team. As per the default logic, the last part will be stripped off and I believe it was to combine technical replicates for RNAseq, but it may also fail for some use cases as you have. Let me see what is the response on this from the core-team.
Fixed in PR - https://github.com/nf-core/rnavar/pull/53
I can confirm this is now fixed in the updated dev branch! Thanks very much @praveenraj2018
Description of the bug
The
CAT_FAST
process is incorrectly merging different samples that share the same prefix but different suffix names in the sample column of samplesheet.csv. For exampleCONTROL_1
andCONTROL_2
are being incorrectly merged butTREATMENT_1
andCONTROL_1
are not being merged. When i run nfcore/rnaseq with the same samplesheet.csv this doesn't happen. You can see in my samplesheet that there are 18 unique samples which should not be merged but CAT_FASTQ is merging them into 4 samples:c9orf72
,ctrl
,fus
andiso
.I think this is being caused by this split by
_
in the meta.id: https://github.com/nf-core/rnavar/blob/3924aac34ce715414fad953f41d98e98d0981fb8/workflows/rnavar.nf#L142presumably
meta.id.split
has been copied over from an old rnaseq pipeline but has since been removed. This is the latest rnaseq pipeline equivelent for comparison: https://github.com/nf-core/rnaseq/blob/89bf536ce4faa98b4d50a8ec0a0343780bc62e0a/workflows/rnaseq.nf#L192Command used and terminal output
output
Relevant files
samplesheet.csv
rnavar.config:
System information
N E X T F L O W ~ version 21.10.3 HPC at Crick Executor slurm Container singularity OS Linux version dev