jodyphelan / tbdb

Standard database for the TBProfiler tool
GNU Lesser General Public License v3.0
29 stars 18 forks source link

Missing mutations from WHO Catalogue #56

Open mlarjim opened 1 year ago

mlarjim commented 1 year ago

Hi! As far as I am concerned, tb-profiler database contains all the mutations that confer drug resistance listed in the WHO catalogue. However, the following mutation is not found in the tbdb https://github.com/jodyphelan/tbdb/blob/master/tbdb.csv

Gene: gid Mutation: gid_347_ins_1_cgcacgatctcaacggcc_ccgcacgatctcaacggca Literature Evidence: Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance (WHO) conf_grade: 2) Assoc w R - Interim (STM_S)

Why is this variant missing?

jodyphelan commented 1 year ago

Some variants could not be translated as there were some issues the reference and alternate not agreeing with the rest of the variant description. In this case it should be an insertion of 1 nucleotide but if we align the reference and the alternate we see it is actually a combination of an insertion and a SNP:

c-gcacgatctcaacggcc
|*||||||||||||||||* 
ccgcacgatctcaacggca

There were a few of these cases

mlarjim commented 1 year ago

Thank you for your remark, Jody. Effectively, the WHO catalogue is mistaken in the variant nomenclature. But the final annotation (column final_annotation.TentativeHGVSNucleotidicAnnotation) states that the mutation is actually a combination of a deletion and an insertion:

c.330_346delGGCCGTTGAGATCGTGCinsTGCCGTTGAGATCGTGCG

Is there any possibility that the tb-profiler database contemplates these cases?

jodyphelan commented 1 year ago

Oh right - I hadn't seen that they had added this hgvs notation now. I will take a look and see if I can include more of these cases.

mlarjim commented 1 year ago

thank you so much!

frogtraveler commented 1 year ago

Hello,

I noticed that mutation fabG1 c.-16A>G in TBDB in only listed as conferring R-interim for INH while in WHO it also has the same prediction for ETH. Is there a reason why ETH prediction was not included?

Thank you! Varvara

frogtraveler commented 1 year ago

Another issue: rrl mutation detected by TBProfiler as n.-255C>T doesn't return match with TBDB though it is present in WHO with "Uncertain" confidence. Instead TBDB has mutation listed as c.-255C>T (also uncertain significance). Those are the same, right?

jodyphelan commented 1 year ago

Hi @frogtraveler ,

Indeed it looks like

  1. fabG1 c.-16A>G it missing for ETH
  2. rrl c.-255C>T should be listed as n.-255C>T

I'll get a new version of the db released this week and look for any other potential issues.

frogtraveler commented 1 year ago

Awesome! Thank you so much, Jody!

jodyphelan commented 1 year ago

Hi @frogtraveler,

I've regenerated the mutation lists based on the hgvs annotations from the WHO list now. The mutations you highlighted are now in:

If you run tb-profiler update_tbdb they should be updated for you :)

frogtraveler commented 1 year ago

Thank you so much, Jody!