Closed mbhall88 closed 1 year ago
I don't have a better suggestion than putting a breakpoint in and seeing what happens.
If you have time for a call next week, i'd be up for a chat as I don't remember/understand why you are using local genotyping, i forget the reason. I think it is/was because of minor alleles. Isn't your drprg minor analysis basically driven by looking at the coverage on both alleles, and could work the same on local and global genotyping? sorry for this basic question, have lost track
Sorry, was going to discuss with leandro, but he was teaching and then I was distracted. Will talk to him tomorrow!
I don't remember/understand why you are using local genotyping, i forget the reason. I think it is/was because of minor alleles.
Motivation and justification.
Isn't your drprg minor analysis basically driven by looking at the coverage on both alleles, and could work the same on local and global genotyping?
Yes, but I still rely on the genotype. I only dig into null genotypes if they cross a start codon, and then, I'm just looking to see if we lose the gene (as a patch until #316 is sorted). I'm not really keen to ignore genotype and implement my own genotyping model when pandora has one already...
I dug into this a little more and the only reason I can see if genotype compatibility. I've removed the allele as POS 330 from the graph too as that is what was causing the problem and it only occurs in one sample that we build the PRG from (out of ~150 samples) and it doesn't contain any common resistance variants.
Fair enough, I'll chew on this. ,
Sorry for the delay! I think what is happening in the genotyped VCF:
gid 330 . TGCCATTGGCGATAGCGCGGCCGGA GGCTACGTCACGCACATTTGCCGGA . . VC=PH_SNPs;GRAPHTYPE=NESTED GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF 0:7,0:5,0:10,0:9,0:57,0:47,0:0.375,1:-7.56853,-83.262:75.6935
gid 351 . CGGA CGGC,TGGA . . VC=PH_SNPs;GRAPHTYPE=NESTED GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF 0:2,9,0:2,8,0:0,10,0:0,9,0:10,39,0:9,35,0:0.75,0,1:-95.4097,-21.0618,-124.709:74.3479
with local genotyping is the following:
pandora
looks for the highest confidence call among all variants, and select that. That would be 330
calling ref (GT CONF 75.6935
). This allele is TGCCATTGGCGATAGCGCGGCCGGA
, spanning until pos 355
;CGGA
), and has lower confidence than the first variant, so the choice on the first variant takes priority. Thus pandora
makes the call on this second variant compatible with the first choice, forcing the call to be CGGA
, i.e. also the ref.IIRC, it is sort of a greedy algorithm, working from the highest to the lowest confidence calls, and making lower-confidence calls compatible with higher-confidence calls... I guess @rmcolq could correct us in case of any (or many) mistakes!
That makes sense. I just thought that because I'm using local genotyping it would override that process?
That is how local genotyping works, or rather that's how Pandora tries to make local genotypes not self inconsistent.
I think local genotyping is this process! Global genotyping will get the ML path and, for each site, will look at the most likely allele. If it agrees with the ML path (i.e. at site 1, ML path calls alt 1 and alt 1 is also the most likely) then it calls alt 1. Otherwise, it calls .
. Local genotyping will instead work from the highest-confidence call towards the lowest one, possibly leaving the ML path, the goal is to make the lower-confidence calls compatible with higher-confidence calls if they are incompatible (either calling one of the alleles or nullifying the call)
Sorry, did not see Zam's message before commenting, which is the same as mine, but much better summarised :joy: I think this is what happens, but also to be honest, last time I actually checked the genotyping code was some 2 years ago...
Ohhh true. Yes, sorry I forgot global would just make it null. Thanks for clearing this up!
I have a weird situation where pandora is selecting ref genotype when it should clearly be the alt - the alt has the highest likelihood.
The position in question is 351. However, for some reason make_prg has not merged it with the variant above it - which happens to also span position 351. BUT, I am using
--local
genotyping, so pandora shouldn't be trying to make genotypes compatible in this scenario?Here are the same positions in the consensus VCF
Any thoughts on what's happened here?