Closed mbhall88 closed 3 years ago
After thinking through a number of options for going from a Pandora VCF to a prediction, the method I have decided to pursue (at least initially - it may turn out to not work) is outlined below.
(gene, interval)
where interval
is the start/end range of the variant. This key will map to the VCF entry. A side note: we will also load filtered variants and make predictions on them, but keep those predictions will only be for debugging/investigative purposes. This will likely be useful for a multitude of reasons.One advantage of this approach is it removes the need to add another external dependency (bcftools
) and also removes the need to deal with ORFs when trying to translate the Pandora consensus.
An example of how the comparison of the intersecting intervals will work is
We have panel variants
#CHROM POS ID REF ALT INFO
gid 4 gid_M2I ATG ATA,ATC,ATT DRUGS=drug1,drug2
gid 7 gid_L3F TTA TTC,TTT DRUGS=drug3
gid 20 gid_A20C A C drugs=drug4
and a Pandora variant
#CHROM POS ID REF ALT ... FORMAT sample
gid 3 . CATGTTAT CATCTTAT,CATGTTCT . GT 1
The interval for the Pandora variant is [3, 11)
. So, for the first panel variant, we get an intersection interval of [4, 7)
. This interval on the called Pandora allele is ATC
, which matches one of the panel variant ALT
alleles, so we say this panel variant, gid_M2I
, is present. For the next panel variant, the intersection interval is [7, 10)
. This interval on the called Pandora allele is TTA
, which matches the panel variant REF
, so we say it is not present. The third panel variant's interval does not intersect with the pandora variant, so we skip it. In the end, we say this sample is resistant to drug1 and drug2, due to variant gid_M2I
This is v slick. One minor issue (no pun intended), is AMR sites are enriched for having minor alleles present, which Pandora won't genotype as being present, as it does haploid calling. Conceivably you could in future keep a secondary hashmap for minor alleles?
Not sure if we can ever reliably detect minors with nanopore unless with huge depth. Mykrobe specifically does not call minors with nanopore
Slick...that's a very nice compliment :blush:
Interestingly, if you look at the Pandora VCF above for R28581
, the variant I said that would be filtered out, actually makes a call for panel variant katG_L141F
. looking at the mykrobe info for this variant there is no evidence for coverage on the alt for either nanopore and Illumina, but pandora clearly thinks there's support for coverage on both ref and alt.
This doesn't directly answer your minor issue, but I guess it shows we should have enough info to try and do minors. I guess it is just a matter of having good data to test with
Well, we're could test by in silico mixing fastq from two isolates at different ratios I guess, choosing the isolates based on Compass vcfs so we get a mix of R and S alleles at a site. This definitely feels like a secondary thing though
Very true. Maybe something for me to pursue in my postdoc :laughing:
The first main step in producing resistance predictions is running
pandora map
on the AMR PRG (#66) with the reads.The first attempt at this, I (randomly) picked sample
R28581
. The mykrobe call for this sample on both Illumina and Nanopore was resistant to Isoniazid, with variantkatG_W191R
and has a lot of supporting coverage.The pandora genotyped VCF, selecting only alt calls
and trimming the unused alt alleles
using the current filtering strategy, the first variant, at position 523, would be filtered due to low FRS. The second variant would be kept.
The encouraging thing is if we look at that position in the panel VCF, the entry is
which is the variant we expect to find