Closed Rohit-Satyam closed 6 months ago
most people might submit whatever comes out of wf-artic pipeline to GISAID and if such submissions are part of routine sequencing,
Unfortunately, we do exactly this :( It's something that has bugged me. @Rohit-Satyam Could you describe how you perform the reconstruction of the consensus? Do you use the bam file output of the wf-artc
pipeline and what do you use for variant calling?
Something to note to anyone else not familiar with wf-artic
, the pipeline will mask (N
) columns that (from memory) are ambiguous eg >1 possible SNP.
Question to the nextclade folks, what is reference
in this context?
Reversions: Private mutations that go back to the reference sequence, i.e. a mutation with respect to reference is present on the attachment node but not on the query sequence.
thanks for chiming in, @ammaraziz .
My take is that whether a diverse site or a reversion to reference are valid depends on a number of parameters and there is no simple criterion that will always give you the right answer. The coverage and diversity threshold you mention above are useful guides.
The reason for flagging these reversions is that it used to be quite common that when a new variant pops up, many people submitted sequences that confidently called reference alleles in drop-out regions (either because their pipeline equated low-coverage with reference, or because of contamination). This tends to be less of an issue nowadays.
to Ammar's question: these are mutations that map to terminal branches of query sequences that make this sequence closer to the reference. This used to be always the root of the tree, but we now also allow non-root sequences to serve as reference. Reference in this context refers to the sequence we align to initially.
Hi
I have a small query. When I primarily process my sample using
wf-artic
pipeline and the upload consensus fasta on nextclade, I do not obtain any private mutations.However, when I perform filtering of some these VCFs and rebuild consensus sequence based on variant Allele Fraction (vafator_af), Variant allele Count (vafator_ac) and variant depth (vafator_dp), some variants as listed below are filtered
"INFO/vafator_af < 0.5 || INFO/vafator_dp < 100 || INFO/vafator_ac < 50"
. We usevafator_dp <100
because Allele fraction being a ration could still be 0.5 even when there are merely 10 out of 20 reads supporting the variant presence.Now since there is no guideline other than a minimum of 20X coverage per base( even when we get more than 100X coverage in amplicon data), most people might submit whatever comes out of
wf-artic
pipeline to GISAID and if such submissions are part of routine sequencing, nextclade might pickup these sequences to make set of private mutation. And then such variants if found absent in assemblies generated after using abovementioned filtering criteria are flagged as reversionSubstitutions. So how do I decide if this is actual reversionSubstitution or not and whether to keep it or not? What would you do if you have this additional information about AF, DP and AC?Thresholds are based on recommendations in this best practices paper