nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
922 stars 708 forks source link

bbsplit behavior with ambiguous reads : --extra_bbsplit_args possible to add ? #1408

Closed ZheFrench closed 1 month ago

ZheFrench commented 1 month ago

I'm dealing with PDX.
I wanted to remove ambiguous reads that map on mouse genome.

From my understanding, in the pipeline, bbsplit keep ambigous (ambiguous2=all) reads (mapped on both genome reference) for mapping.

It doesn't remove reads that are mapped on other references from the bbsplit_fasta_list.

From here https://github.com/BioInfoTools/BBMap/blob/master/sh/bbsplit.sh

ambiguous2=<best>   Set behavior only for reads that map ambiguously to multiple different references.
                    Normal 'ambiguous=' controls behavior on all ambiguous reads;
                    Ambiguous2 excludes reads that map ambiguously within a single reference.
                       best   (use the first best site)
                       toss   (consider unmapped)
                       all   (write a copy to the output for each reference to which it maps)

I would like to remove these reads using ambiguous2=toss. Could be useful to add an extra_bbsplit_args parameter ? or is there another way to change this behavior whereas I'm using a singularity profile ?

MatthiasZepper commented 1 month ago

The extra_tool_args parameters were never meant to cater for all tools, but to be a convenience solution for the most commonly requested tools. The primary way of customizing a tool's arguments is via a custom configuration.

ZheFrench commented 1 month ago

Sorry but it's not clear to me where to apply the change.

For Nextflow DSL2 nf-core pipelines - parameters defined in the parameter block in custom.config files WILL NOT override defaults in nextflow.config!

So how do you override something ?
From here I see exactly the 'ambigous2' parameter I want to change in module.config.

if (!params.skip_bbsplit) {
    process {
        withName: 'BBMAP_BBSPLIT' {
            ext.args   = 'build=1 ambiguous2=all maxindel=150000'
            publishDir = [
                [
                    path: { "${params.outdir}/bbsplit" },
                    mode: params.publish_dir_mode,
                    pattern: '*.txt'
                ],
                [
                    path: { "${params.outdir}/bbsplit" },
                    mode: params.publish_dir_mode,
                    pattern: '*.fastq.gz',
                    enabled: params.save_bbsplit_reads
                ]
            ]
        }
    }
}

So should I use the params-file option pointing to a json file with : `{ " ext.args": "ambiguous2=toss" }

` I doubt that is the good syntax...

UPDATE : So the id would be to change directly in this file maybe in the User’s home directory. .nextflow/assets/nf-core/rnaseq/modules/nf-core/bbmap/bbsplit/nextflow.config

MatthiasZepper commented 1 month ago

Could be useful to add an extra_bbsplit_args parameter ?

Well, the convention in nf-core is, that variables in the context of a whole pipeline are referred to as parameters and in the context of a single tool as arguments. This is, why your proposed new pipeline parameter would be called extra_bbsplit_args and not extra_bbsplit_params.

So what you need to do to override the arguments is creating a custom config file that looks approximately like this:

process {
        withName: 'BBMAP_BBSPLIT' {
            ext.args   = 'build=1 ambiguous2=toss maxindel=150000'
        }
    }

This file you can provide with nextflow run -c your_config ... thanks to Nextflow's config priority. The parameter block is irrelevant here, since you are only reconfiguring a single process.