nf-core / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
118 stars 108 forks source link

Add support for artic primer set version V5.3.2 #373

Open mbdabrowska1 opened 1 year ago

mbdabrowska1 commented 1 year ago

Description of the bug

Hi, there has been a new primer set version uploaded to https://github.com/artic-network/artic-ncov2019/tree/master/primer_schemes/nCoV-2019/V5.3.2, would it be possible to add it to the genomes.config for viralrecon?

Command used and terminal output

No response

Relevant files

No response

System information

No response

drpatelh commented 1 year ago

Done! https://github.com/nf-core/configs/commit/f84bc302a1b22e6b4654714f280b31ef7b0bf3b6

Are you able to test it out with the latest version of the pipeline and let us know if everything is working as expected please?

mbdabrowska1 commented 1 year ago

I test it with the following configuration:

nextflow run nf-core/viralrecon 
-profile sbc_sharc 
-resume 
-with-report 
-with-trace 
-with-timeline 
--input '/fastdata/md1mbdx/Covid_and_ITS2_PA_PZ_260123/SARScov2/viralrecon/Sars_cov_2.csv' 
--outdir results 
--platform nanopore 
--artic_minion_caller medaka
--artic_minion_medaka_model r941_min_hac_g507 
--genome 'MN908947.3' 
--primer_set_version 5.3.2 
--fastq_dir '/fastdata/md1mbdx/Covid_and_ITS2_PA_PZ_260123/SARScov2/input/' 
-c viralrecon_resources.conf

Which is the same configuration I always use, the only difference is between 4.1 and 5.3.2 but I get this error:

ERROR: Validation of pipeline parameters failed!

* --primer_set_version: expected type: Number, found: String (5.3.2)

As I mentioned, this used to work with 4.1 and never complained about it being a string. I assume this is because of the two dots instead of one that makes the pipeline interpret it as a string? Perhaps this could be simply fixed by changing the nextflow_schema.json primer_set_version type to string instead of number, but not sure whether this would have any further consequences.

drpatelh commented 1 year ago

Yep, this is a more complex issue with the way parameters are coerced by Nextflow and then mismatch the validation.

Changing the line below to string does fix error above: https://github.com/nf-core/viralrecon/blob/3731dd3a32a67a2648ea22c2bd980c224abdaee2/nextflow_schema.json#L112

However, the pipeline fails when you provide older primer sets that are integers:

ERROR: Validation of pipeline parameters failed!

* --primer_set_version: expected type: String, found: Integer (1)

Any suggestions @ewels ?

For now, I think the best solution is to provide the parameters via a -params-file params.yml explicitly:

fasta: 'https://github.com/artic-network/artic-ncov2019/raw/master/primer_schemes/nCoV-2019/V5.3.2/SARS-CoV-2.reference.fasta'
gff: 'https://github.com/nf-core/test-datasets/raw/viralrecon/genome/MN908947.3/GCA_009858895.3_ASM985889v3_genomic.200409.gff.gz'
primer_bed: 'https://github.com/artic-network/artic-ncov2019/raw/master/primer_schemes/nCoV-2019/V5.3.2/SARS-CoV-2.scheme.bed'
scheme: 'SARS-CoV-2'

Haven't tested this so please let me know if the pipeline complains elsewhere!

drpatelh commented 1 year ago

In general though, if you plan on using this primer set even more often it's worth downloading the files locally and updating the -params-file so you don't download the files every time you run the pipeline e.g.

fasta: '/my/local/path/nCoV-2019/V5.3.2/SARS-CoV-2.reference.fasta'
gff: '/my/local/path/MN908947.3/GCA_009858895.3_ASM985889v3_genomic.200409.gff.gz'
primer_bed: '/my/local/path/nCoV-2019/V5.3.2/SARS-CoV-2.scheme.bed'
scheme: 'SARS-CoV-2'
mbdabrowska1 commented 1 year ago

When using the params-file do I just need to specify the genome and leave primer set etc as defaults? I actually never used it before but this could be useful for testing new primer sets that arent commercially available yet.

maxulysse commented 1 year ago

Yep, this is a more complex issue with the way parameters are coerced by Nextflow and then mismatch the validation.

Changing the line below to string does fix error above:

https://github.com/nf-core/viralrecon/blob/3731dd3a32a67a2648ea22c2bd980c224abdaee2/nextflow_schema.json#L112

However, the pipeline fails when you provide older primer sets that are integers:

ERROR: Validation of pipeline parameters failed!

* --primer_set_version: expected type: String, found: Integer (1)

Any suggestions @ewels ?

For now, I think the best solution is to provide the parameters via a -params-file params.yml explicitly:

fasta: 'https://github.com/artic-network/artic-ncov2019/raw/master/primer_schemes/nCoV-2019/V5.3.2/SARS-CoV-2.reference.fasta'
gff: 'https://github.com/nf-core/test-datasets/raw/viralrecon/genome/MN908947.3/GCA_009858895.3_ASM985889v3_genomic.200409.gff.gz'
primer_bed: 'https://github.com/artic-network/artic-ncov2019/raw/master/primer_schemes/nCoV-2019/V5.3.2/SARS-CoV-2.scheme.bed'
scheme: 'SARS-CoV-2'

Haven't tested this so please let me know if the pipeline complains elsewhere!

I would add that in the ignore_params thingy

drpatelh commented 1 year ago

Ok. Ignore everything I said about using a params file. Can you try with your original command in this comment and add this parameter too please?

--schema_ignore_params 'genomes,primer_set_version'

Thanks @maxulysse !

Codes1985 commented 1 year ago

Hello @drpatelh

I'd like to hijack this thread, if I may.

My lab has generated tiled-PCR amplfication assays using PrimalScheme for a couple of other viruses. I was wondering if there is a workaround to allow Viralrecon to take in completely custom schemes? They should be functionally the same as those being generated for SARS-CoV-2.

I wasn't sure if the thoughts in this thread might be applicable to my scenario.

Thanks in advance.

-Cody

drpatelh commented 1 year ago

Hi @Codes1985 . You might be able to use similar parameters to the ones suggested for the SWIFT panel with customisations for your genome reference and primer sets. You might need to check that the pipeline is using the primer sets as expected by checking some of the heatmaps / plots etc: https://nf-co.re/viralrecon/2.6.0/docs/usage#swift-primer-sets

Might be good to have something like this as generic docs in that section too.

kneubehl commented 1 year ago

Sorry to further the hijacking but did @Codes1985 have any luck with custom schemes? I am looking to do the same kind of analysis with custom primer sets for other viruses as well.

Codes1985 commented 1 year ago

Sorry to further the hijacking but did @Codes1985 have any luck with custom schemes? I am looking to do the same kind of analysis with custom primer sets for other viruses as well.

Hello @kneubehl

I'm so sorry - I haven't had a chance to try it out yet! I will reply here as soon as I can. However, if you beat me to it, I 'd love to hear.

Take care!