Open jma1991 opened 10 months ago
This believe this is an issue with the samplesheet.csv info
Follow @FelixKrueger and @bioinfoMMS discussion in the slack channel -> conversation
I was able to run the pipeline using bismark
by adding an underscore "_" inside the name of the sample (in column 1) in the samplesheet.csv
e.g. ( use sample1_rep1
instead of sample1
)
Make sure you use 4 header columns instead of 3 being the last genome
. (this isn't very clear because the current documentation at https://nf-co.re/methylseq does not mention it! But Felix says it in the conversation
e.g.:
# use this:
sample, fastq_1, fastq_2, genome
sample1_rep1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz,
sample2_rep1, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz,
# instead of:
sample,fastq_1,fastq_2
sample1, bla/sample1_R1.fastq.gz, bla/sample1_R2.fastq.gz
sample2, bla/sample2_R1.fastq.gz, bla/sample2_R2.fastq.gz
To add multiple lanes of the same sample, repeat the name of the sample, and they will merge during the processing.
This is bug in the instruction instead of in the code. To make one sample in different lanes, the sample sheet should look like this:
sample1_REP1,fq1.gz,fq2.gz sample1_REP2,fq11,ga,fq12.gz
Description of the bug
I've identified a potential issue in the recent pipeline release (v2.5.0). It seems the groupTuple command is executed twice during the input channel creation and branching of FASTQ files. As a result, the pipeline is unable to recognise samples processed across multiple lanes, due to an additional layer of file nesting. See here: https://github.com/nf-core/methylseq/blob/66c6138322ec5cc87738219e24d65240299dcc10/workflows/methylseq.nf#L98-L105
Command used and terminal output
No response
Relevant files
No response
System information
No response