nf-core / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
111 stars 104 forks source link

Non-SCV2 amplicon run returns consensus genomes with no low-coverage masking #420

Open ddomman opened 4 months ago

ddomman commented 4 months ago

Description of the bug

Viralrecon has worked perfectly for our SCV2 and some hybrid capture protocols (with the metagenomic side). However, when I ran the pipeline by passing a custom bed file and fasta reference for RSV, the default pipeline produced consensus genomes that have no low coverage mask or Ns. It appears the bcftools consensus pipeline IS substituting the variants but for all low coverage areas, the reference base is given rather than Ns.

Switching over to the iVar consensus option ( --variant_caller ivar, --consensus_caller ivar), pipeline produces correct consensus genomes with low coverage areas masked with Ns.

Command used and terminal output

No response

Relevant files

No response

System information

No response

svarona commented 1 month ago

Hi @ddomman ! We are using viralrecon with amplicon RSV data too and it masks perfectly the consensus using ivar as variant caller and bcftools as consensus genome generator. We would need to replicate you specific analysis. Would you mind to send us the files you used and the command to run viralrecon?

chocogangsta commented 3 weeks ago

Halo @svarona

could you please inform me of the specific commands you input to conduct the analysis for RSV? I've been attempting it myself but without success. Could you kindly provide the commands used for the Illumina and Nanopore platforms, if possible?

svarona commented 3 weeks ago

Hi @chocogangsta. Viralrecon can't run on RSV nanopore data, because it uses ARTIC protocol, which works only for SARS-CoV-2. For Illumina sequencing these are the commands we're using:

nextflow run nf-core-viralrecon-2.6.0/workflow/main.nf \
          --input samplesheet.csv \
          --outdir EPI_ISL_1653999_viralrecon_mapping \
          --fasta RSV/EPI_ISL_1653999.fasta \
          --gff RSV/EPI_ISL_1653999.gff \
          --primer_bed merged_1653999_scheme.bed \
          --primer_fasta merged_primers.fasta \
          --nextclade_dataset_name 'rsv_b' \
          --nextclade_dataset false \
          --nextclade_dataset_tag '2023-10-02T12:00:00Z' \
          --platform illumina \
          --protocol amplicon \
          --variant_caller ivar \
          --consensus_caller bcftools \
          --skip_pangolin \
          -resume
svarona commented 3 weeks ago

@ddomman We've seen cases were bcftools won't substitute variants neither mask the consensus genome, rendering a consensus genome exactly the same as the reference genome. Which can be due to non linux new line characters on the reference .fasta used (usually due to its edition on microsoft word). I assume that this one is not your case as you say that it is replacing variants properly.