Mutation concordance analysis

mbhall88 commented 2 years ago

Our current approach to the genotype DST concordance is to check whether the susceptibility calls for each drug are the same between Illumina and Nanopore.

There is an additional, fine-grained analysis, which is to see whether the genotype calls for each mutation in the catalogue being used are the same.

The way this is going to be analysed is, for each sample, take the Illumina and Nanopore JSON output with all mutations and go through each mutations and check whether the genotype call is the same. If the variant is filtered, we ignore it.

@iqbal-lab One question is what to do with minor resistance (i.e., het genotype) calls (https://github.com/mbhall88/head_to_head_pipeline/issues/75#issuecomment-962293131) and null calls? I'm going to assume we want to treat these at "filtered" and ignore them. What I mean by this is if either the Illumina or Nanopore call for a mutation is null, het, or filtered, we treat as filtered and don't count the comparison in the concordance analysis.

mbhall88 commented 2 years ago

Alright, using the classifications mentioned above (i.e. minor resistance, null, and filtered calls are all ignored), at the mutation level, there are 0 FNs and 4 FPs across all 151 samples.

mbhall88 commented 2 years ago

gif

iqbal-lab commented 2 years ago

Awesome! In answer to your question, I was going to suggest, treat minor calls from Mykrobe as S. That's what Mykrobe does when you give it the appropriate cmd line arg to ignore minors. But anyway this is great news

mbhall88 commented 2 years ago

When I treat minor resistance as susceptible, we get 14 FPs and 1 FN.

The FN is described in https://github.com/Mykrobe-tools/mykrobe/issues/139 and we can effectively ignore I think.

3 of the FPs are a homopolymer deletion, which affects three consencutive positions (so it's kind of 1 FP I guess).

1 FP is a katG deletion (CC->C), but there are other mutations in katG so the resistance call is not impacted.

10 FPs are Illumina minor resistance calls that we are now treating as S and Nanopore is saying R. I've looked through all of these and realised that treating minor as S is probably not the right/fair thing to be doing here. For instance, "minor" resistance does not actually mean that the minor allele is the resistant one. We could have the major allele being resistance and the minor being S. In these cases (which is all 10 FPs) it actually makes more sense from a genotyping perspective to say they are resistant. In all 10 FPs, the ALT allele has much higher coverage than the REF.

So I think the fair thing to do is switch to using a haploid model for Illumina data - which is what we're doing for Nanopore.

mbhall88 commented 2 years ago

When using a haploid model for both Illumina and Nanopore, we get 4 FPs and 1 FN. These are the same 4 FPs relating to indels mentioned above. The 1 FN is also the same weird one mentioned above.

iqbal-lab commented 2 years ago

Confused about the difference between "minor is S" and a haploid model.

iqbal-lab commented 2 years ago

Haploid model means just tell Mykrobe via cmd like to ignore minors. Minor as S means run default Mykrobe and flip r to S. Right? Oh no maybe the emphasis is really about doing the same for illumina and Nanopore

mbhall88 commented 2 years ago

Confused about the difference between "minor is S" and a haploid model.

Let's say mykrobe has called 'r' and the REF/ALT median depth is 8/54 (a real example). I would argue this is "major" resistance and "minor susceptibility". Switching r->S does not seem like the smart thing to do in this example. However, if we use a haploid model, Cortex makes the decision about whether the REF or the ALT is the most likely call.

In addition, as you mentioned, both Illumina and Nanopore are then also using the same model - which seems fair?

iqbal-lab commented 2 years ago

Ah yes,I 100% agree

Same model for both is good
We should just let the model do its thing, not add a layer of fiddling on top.
Yes for that example I'd call that major resistant

mbhall88 / head_to_head_pipeline

Mutation concordance analysis #82