nf-core / demultiplex

Demultiplexing pipeline for sequencing data
https://nf-co.re/demultiplex
MIT License
45 stars 38 forks source link

Disable automated trimming by demultiplexing tools. #207

Closed grst closed 4 months ago

grst commented 4 months ago

Description of feature

When using an Illumina Samplesheet with Adapters specified in the [Settings] section, bcl2fastq, cellranger mkfastq or BCLconvert automatically perform adapter trimming, e.g.

[Settings],,,,,,,,,,,
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,,,,,,,,,,
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT,,,,,,,,,,

I do not think we want to use this feature because

image

I therefore suggest to include a step that removes those lines from the input samplesheets, unless it is specifically disabled via a config flag. This can be achieved e.g. in groovy with a snippet such as

 samplesheet_in
        .readLines()
        .each { line ->
            if ( line =~ /Adapter,[ACGT]+,/ ) {
                line = line.replaceAll(/Adapter,[ACGT]+,/, 'Adapter,,')
            }
            else if ( line =~ /AdapterRead2,[ACGT]+,/ ) {
                line = line.replaceAll(/AdapterRead2,[ACGT]+,/, 'AdapterRead2,,')
            }
            samplesheet_out << line + '\n'
        }

A section should be added to the documentation dedicated to trimming.

CC @apeltzer

nschcolnicov commented 4 months ago

Putting this one on hold until https://github.com/nf-core/demultiplex/issues/209 is closed

nschcolnicov commented 4 months ago

PR for #209 was merged, closing this ticket.