Closed 0xaf1f closed 3 years ago
Hi @0xaf1f
Thanks for letting me know, I was not aware of that publication. Its looks as though all the discrepant mutations are located in drug resistance genes. I filtered the original list from the Coll publication to remove most of the drug resistance genes when creating barcode.bed. So TB-Profiler is not using those mutations for the lineage predictions. Hope that clears it up.
Jody
Yes, the whole subject of that paper is about lineage markers in drug-resistance genes, so it's not that anything in there should be necessarily discarded.
Right! There is quite a bit of redundancy in the full list so I don't think that filtering out a couple of mutations will impact the predictions for most of the samples. We are actually currently looking at revising the barcode so I will be updating this in the next few weeks.
[apologies -- I first posted this in the tb-profiler repository and it better belongs here]
I was reading https://doi.org/10.1186/s13073-020-00726-5 and, in it, the authors mentioned discovering some errors in the Coll & co. barcode:
The seven mutations being (searching the Coll et al - natcomm2014 column in the referenced table for "(solved)"):
However, I checked for a couple of these positions in https://github.com/jodyphelan/TBProfiler/blob/master/db/tbdb.barcode.bed (and also in https://github.com/jodyphelan/tbdb/blob/master/barcode.bed ) and didn't find them at all