jodyphelan / tbdb

Standard database for the TBProfiler tool
GNU Lesser General Public License v3.0
28 stars 18 forks source link

Chromosome:g. mutations #21

Closed malare closed 3 years ago

malare commented 4 years ago

Dear Jody, I was wondering why following mutations are not listed in the tbdb but are considered by TBprofiler. Moreover, why you used the g. nomenclature but could say that it is in a specific gene? gid Chromosome:g.4407912_4408042del streptomycin gid Chromosome:g.4407912_4408042del streptomycin pncA Chromosome:g.2288682_2289080del pyrazinamide Thank you for your help

jodyphelan commented 4 years ago

Hi @malare,

These mutations are specified by the wildcard "large_deletion" code such as this one. Because these deletions are rare and will mostly likely have different genomic coordinates we can use this code to associate any large deletion with resistance rather than having to hard-code all the possible deletions in the database. This works in the same way as the "frameshift" dode.

The reason I've chosen to use this nomenclature is because some of these deletions can span large regions encompassing many genes. It is reported using the "g." format so that it is able to report the actual coordinates on the chromosome level (e.g. Chromosome:2288682-2289080 in the case of your example). This will give us an idea about the size but also let us know which genes are involved. Any overlapping drug resistance genes are added to the annotation.

Hope that answers your question, but let me know if you would like some more clarification.

Jody

peflanag commented 2 years ago

Hi Jody,

I just wanted to jump on the back of this. We are currently doing our WHO EQA and TBprofiler and MTBseq. On the whole, everything pretty much matches up except TBprofiler is calling pica resistance in two of our isolates when MTBseq isn't. TBprofiler says its pncA Chromosome:g.2288682_2289080del

I've looked at the list in your post above and cannot find it. Is there a way to find the coordinates because I want to compare it with what MTBseq fits at those coordinates. is it this mutation as noted by the WHO

pncA_560_del_163_caggagctgcaaaccaactcgacgctggcggtgcgcatctcctccagcgc_c

Cheers,

P

jodyphelan commented 2 years ago

Hi @peflanag, at the moment it is a bit difficult to convert the genomic coords into gene-based. But I'll create a separate issue for this and hopefully we can put it into the next release.

Fro now, I've manually converted the genomic coords 2288682-2289080 into pncA 162_560_del. I think this is the same variant but potentially the WHO list represented the indels with the first coordinate shifted (or the delly prediction wasn't 100% accurate).

peflanag commented 2 years ago

Cheers Jody!