Pipeline unable to recognise samples processed across multiple lanes

jma1991 commented 10 months ago

Description of the bug

I've identified a potential issue in the recent pipeline release (v2.5.0). It seems the groupTuple command is executed twice during the input channel creation and branching of FASTQ files. As a result, the pipeline is unable to recognise samples processed across multiple lanes, due to an additional layer of file nesting. See here: https://github.com/nf-core/methylseq/blob/66c6138322ec5cc87738219e24d65240299dcc10/workflows/methylseq.nf#L98-L105

Command used and terminal output

No response

Relevant files

No response

System information

No response

mz448 commented 10 months ago

This believe this is an issue with the samplesheet.csv info

Discussion reference

Follow @FelixKrueger and @bioinfoMMS discussion in the slack channel -> conversation

How to "solve" it:

I was able to run the pipeline using bismark by adding an underscore "_" inside the name of the sample (in column 1) in the samplesheet.csv e.g. ( use sample1_rep1 instead of sample1) Make sure you use 4 header columns instead of 3 being the last genome. (this isn't very clear because the current documentation at https://nf-co.re/methylseq does not mention it! But Felix says it in the conversation

e.g.:

# use this:
sample, fastq_1, fastq_2, genome
sample1_rep1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz,
sample2_rep1, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz,

# instead of:
sample,fastq_1,fastq_2
sample1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz
sample2, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz

To add multiple lanes of the same sample, repeat the name of the sample, and they will merge during the processing.

wkang0 commented 8 months ago

This is bug in the instruction instead of in the code. To make one sample in different lanes, the sample sheet should look like this:

sample1_REP1,fq1.gz,fq2.gz sample1_REP2,fq11,ga,fq12.gz

nf-core / methylseq