mbhall88 / head_to_head_pipeline

Snakemake pipelines to run the analysis for the Illumina vs. Nanopore comparison.
GNU General Public License v3.0
5 stars 2 forks source link

RIF and INH off-catalogue mutations #85

Closed mbhall88 closed 2 years ago

mbhall88 commented 2 years ago

In the interest of documenting this - as it will be useful for reviews potentially, and especially for the drprg paper - here is more tantalising information about the phenotype concordance FNs for INH and RIF.

There are 9 INH FNs. I have previously pointed out that 6 of these have a mutation R463L (g1388t, NC_000962.3:c2154724a), which is not in the catalogue. We see this mutation in 4/6 RIF FNs. In addition, 4/6 (3 the same as above and one different) RIF FNs have an rpoB mutation A1075A (t3225c, NC_000962.3:t763031c) (mada_1-17, mada_1-18, mada_1-11, mada_1-33). This is very interesting. This mutation is considered a Beijing lineage marker and is a synonymous mutation. So most people have just said it isn’t associated with resistance. BUT a recent paper (1) highlighted that this mutation is more prevalent in RR isolates. It has also been mentioned in other papers (2, 3, 4). One sample (mada_1-17), in addition to R463L in katG and A1075A in rpoB also has a 6bp deletion in rpoB. This one is tricky as the deletion spans multiple codons and as such, mykrobe didn’t pick it up because it only encodes 1-2bp indels (@iqbal-lab might find this interesting).

The 6 samples with RIF FNs: mada_1-11, mada_1-12, mada_1-17, mada_1-18, mada_1-33, mada_1-50 The 9 INH FNs include the 6 RR samples above, plus: mada_1-2, mada_1-21, mada_1-6

mbhall88 commented 2 years ago

Another comment, I noticed that katG R463L was used as a phylogenetic marker in the NEJM paper (considered susceptible). See the NEJM appendix https://www.nejm.org/doi/suppl/10.1056/NEJMoa1800474/suppl_file/nejmoa1800474_appendix.pdf

mbhall88 commented 2 years ago

It would be very interesting to go through the NEJM data and see how prevalent these two mutations are in R vs S isolates

mbhall88 commented 2 years ago

Further interesting informaion about the rpoB A1075A mutation. As I said, it is treated as a Beijing lineage marker (Lineage 2). However, the four samples with this mutation have varying lineages

sample lineage
mada_1-33 1.1.2
mada_1-18 1.1.2
mada_1-11 2.2
mada_1-17 3.1.1
iqbal-lab commented 2 years ago

What's the prevalence of this snp in the full cryptic 12k?

mbhall88 commented 2 years ago

For rpoB_A1075A it's present in 56.4% (8580/15211) samples.
For katG_R463L it's present in 56.3% (8565/15211) samples.
These two variants co-occur in 8557 samples - so effectively they always co-occur.

mbhall88 commented 2 years ago

When I go through the samples that have phenotypes I get the following contingency tables

katG_R463L present katG_R463L absent
INH R 2999 2093
INH S 2540 3306
rpoB_A1075 present rpoB_A1075 absent
RIF R 2763 2021
RIF S 3023 3567

My statistics knowledge is not brilliant and I am struggling to find a test that will tell us if these mutations are more prevalent in resistant samples. Any ideas @iqbal-lab ?