Closed mbhall88 closed 1 year ago
Suggest you turn a catalogue into three catalogues
Part A: nucleotide matches Use VCF definition of catalogue, and match Drprg vcf with catalogue vcf. Some trickiness with indel normalisation.
Part B: amino variants First translate the pandora-inferred sequence into amino sequence Then,
Finally notice if there is
Part C
The
argmatch
function currentlty does what it is supposed to; it returns whether a panel variant matches with a variant in the pandora VCF. But there can be false positives which lead to FP resistance calls. For example(I've removed some VARIDs for brevity)
This variant simplifies to
The two variants
gid_A205E
andgid_A205*
do overlap this VCF record, but they're not technically a match.This position, 715, maps to codon 205, which in the reference is
GCA
. So this VCF position is the last base in the codon.Changing the last base to
G
, as this record calls, would make the codonGCA->GCG
which in amino space isA->A
- i.e., synonymous. However, whatargmatch
does (by design) is look in the panel VCF and see that aG
at this position matches with those other two variants, which it does, but it's complicated....Here are the panel VCF records for those two variants
Extracting just position 715 from these variants does indeed provide
A->G
as an option. But it ignores the fact that you would also need to change positions 713 and/or 714, which our running example at the top does not.I need to think about the cleanest way to handle this... Any thoughts are most welcome though.