nf-core / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
121 stars 108 forks source link

ARTIC Guppyplex default settings result in removal of valid reads for 1200 bp amplicon protocol SARS-CoV-2 Nanopore sequence data #229

Closed peterk87 closed 3 years ago

peterk87 commented 3 years ago

The current config seems to be set for the ARTIC V1-4 primer schemes: https://github.com/nf-core/viralrecon/blob/f0171324a60d1759ca7f5d02351372d97d22cc12/conf/modules.config#L54-L55

This results in most reads from the 1200 bp amplicon protocol to be filtered out during the Guppyplex stage.

The authors of the protocol used the following settings in the paper describing the protocol:

We used 250 bp as the minimum read length cut-off and 1500 bp as the maximum length cut-off, which resulted in between 20% and 30% of the reads being filtered.

https://academic.oup.com/biomethods/article/5/1/bpaa014/5873518#207982774

Workaround

Providing a custom modules config file with the following text resolved the issue:

params {
    modules {
        'nanopore_artic_guppyplex' {
            args            = '--min-length 250 --max-length 1500'
            publish_files   = false
            publish_dir     = 'guppyplex'
        }
    }
}
drpatelh commented 3 years ago

Awesome! Thanks for reporting @peterk87. How does the fix here look to you https://github.com/drpatelh/nf-core-viralrecon/commit/f6a9fdf00d6f54253d3a826bc187b4d86e3b09bf ? This obviously assumes that --primer_set_version 1200 is specified on the command-line because I would prefer not to go down the rabbit hole of parsing the BED file, checking amplicon sizes and re-setting these options.

peterk87 commented 3 years ago

Looks good to me! Thanks for addressing this issue so quickly @drpatelh!

prefer not to go down the rabbit hole of parsing the BED file, checking amplicon sizes and re-setting these options

That sounds like it'd be a real headache. As an alternative, it would be nice to have CLI options to specify min/max lengths for Guppyplex (or to skip Guppyplex) when using a custom primer scheme (or trying something different) so a custom config file wouldn't need to be specified.

drpatelh commented 3 years ago

Sweet. Thanks!

I have intentionally avoided adding separate params for the min/max lengths because we have quite a few pipeline parameters already. As you know, with the way I have implemented passing options around using a custom.config this is now a possibility with DSL2. I guess it's a toss up between a minimalist pipeline and usability. If you have strong feelings to add these then it should be relatively straightforward 🙂

I did think about adding a --skip_artic_guppyplex parameter but realised that is performing the function of concatenating all the FastQ files to be passed to downstream processes so probably not a good idea. I think we need to completely re-write the input options for the Nanopore arm of the pipeline at some point:

Will close this for now but feel free to re-open if I have missed anything.