nf-core / smrnaseq

A small-RNA sequencing analysis pipeline
https://nf-co.re/smrnaseq
MIT License
73 stars 125 forks source link

`protocol` option not interpreted ? #351

Closed nservant closed 2 months ago

nservant commented 5 months ago

Description of the bug

I'm wondering if the protocol option is well interpreted for the adapter trimming.
If I use --protocol illumina or --protocol qiaseq, there is no difference in the fastp command which is always run with ;

fastp \
    --in1 XXX.fastq.gz \
    --out1  XXX.fastp.fastq.gz \
    --thread 6 \
    --json XXX.fastp.json \
    --html XXX.fastp.html \
    --adapter_fasta known_adapters.fa \
     \
    -l 17 --max_len1 100 --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    2> >(tee XXXX.fastp.log >&2)

I do not think that it has a strong impact on the results, because all adapter sequences are available in the known_adapters.fa file, but based on the code ;

code: subworkflows/local/utils_nfcore_smrnaseq_pipeline/main.nf

def formatProtocol(params,log) {

    switch(params.protocol){
        case 'illumina':
            params.putIfAbsent("clip_r1", 0);
            params.putIfAbsent("three_prime_clip_r1",0);
            params.putIfAbsent("three_prime_adapter", "TGGAATTCTCGGGTGCCAAGG");
            break
        case 'nextflex':
            params.putIfAbsent("clip_r1", 4);
            params.putIfAbsent("three_prime_clip_r1", 4);
            params.putIfAbsent("three_prime_adapter", "TGGAATTCTCGGGTGCCAAGG");
            break
    case 'qiaseq':
            params.putIfAbsent("clip_r1",0);
            params.putIfAbsent("three_prime_clip_r1",0);
            params.putIfAbsent("three_prime_adapter","AACTGTAGGCACCATCAAT");
            break
        case 'cats':
            params.putIfAbsent("clip_r1",3);
            params.putIfAbsent("three_prime_clip_r1", 0);
            params.putIfAbsent("three_prime_adapter", "AAAAAAAA");
            break
        case 'custom':
            params.putIfAbsent("clip_r1", params.clip_r1)
            params.putIfAbsent("three_prime_clip_r1", params.three_prime_clip_r1)
        default:
            log.warn "Please make sure to specify all required clipping and trimming parameters, otherwise only adapter detection will be performed."
        }

        log.warn "Running with Protocol ${params.protocol}"
        log.warn "Therefore using Adapter: ${params.three_prime_adapter}"
        log.warn "Clipping ${params.clip_r1} bases from R1"
        log.warn "And clipping ${params.three_prime_clip_r1} bases from 3' end"
    }

and conf/modules.config

    withName: '.*:FASTQ_FASTQC_UMITOOLS_FASTP:FASTP' {
        ext.args = [ "",
            params.trim_fastq                           ? "" : "--disable_adapter_trimming",
            params.clip_r1 > 0                          ? "--trim_front1 ${params.clip_r1}" : "", // Remove bp from the 5' end of read 1.                                                                                                                              
            params.three_prime_clip_r1 > 0              ? "--trim_tail1 ${params.three_prime_clip_r1}" : "", // Remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed.                                                                 
            params.fastp_min_length > 0                 ? "-l ${params.fastp_min_length}" : "",
            params.fastp_max_length > 0                 ? "--max_len1 ${params.fastp_max_length}" : "",
            params.three_prime_adapter == "auto-detect" ?  "" : "--adapter_sequence ${params.three_prime_adapter}"
        ].join(" ").trim()
}

I would expect the parameter --adapter_sequence to be updated according to the ${params.three_prime_adapter} which is itself defined based on the protocol ?

Command used and terminal output

Apps/nextflow-23.10.1/nextflow run smrnaseq-2.3.1 \
  --input ${PROJ_DIR}/data/samplesheet.csv \
  --protocol 'illumina' \
  --mirtrace_species 'rno' \
  --genome 'Rnor_6.0' \
  --outdir ${PROJ_DIR}/results \
  -w ${PROJ_DIR}/results/work \
  -profile singularity \
  -c ${PROJ_DIR}/curie.config \
  -resume "

Relevant files

No response

System information

nextflow-23.10.1 smrnaseq-2.3.1

christopher-mohr commented 5 months ago

Hi @nservant, thanks for reporting. You're definietly right, that the protocol option currently does not seem to have the expected effect. As you also mentioned, usually there should be no strong impact since the miRNA adapters are provided in the fasta file. However, if clipping is needed this might have some impact.

We are also running some additional tests internally to confirm this.

lpantano commented 3 months ago

@christopher-mohr , let me know if you have some insight, I would be happy to work on this code if you figure out something else.

lpantano commented 3 months ago

as well protocol is use specifically for mirtrace, independently of the other tools.

apeltzer commented 2 months ago

Fixed in #364, please give dev a try 👍🏻