jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
104 stars 43 forks source link

lineage - how it works? #317

Open alantsangmb opened 9 months ago

alantsangmb commented 9 months ago

I have a question regarding how tbprofiler assigns lineages. I understand that lineages are determined by identifying lineage-specific SNPs. However, I am not familiar with the specific details or algorithm of how this process works.

I have obtained data for two strains, SRR23497958 and ERR266122. SRR23497958 is a lineage 3 M. tuberculosis, and tbprofiler assigned it as La3;Lineage 3. When I examined the lineage.snp.txt file, I found that only one SNP specific to La3 and all SNPs specific for lineage 3 are present:

La3 Chromosome 3396569 93 0 1 lineage3 Chromosome 12204 69 0 1 lineage3 Chromosome 69984 77 0 1 lineage3 Chromosome 342873 107 0 1 lineage3 Chromosome 652950 113 0 1 lineage3 Chromosome 1450316 85 0 1 lineage3 Chromosome 1764225 81 0 1 lineage3 Chromosome 1925136 59 0 1 lineage3 Chromosome 2738221 89 0 1 lineage3 Chromosome 2782498 56 0 1 lineage3 Chromosome 4396495 88 0 1 lineage4 Chromosome 206481 4 43 0.085106382978723

However, for ERR266122, which is a Mycobacterium canetti, tbprofiler assigned it to La3 because there is one SNP specific to La3. Interestingly, there is another SNP specific to lineage 4.9 present as well. I am curious why tbprofiler did not assign ERR266122 to lineage 4.9 as well:

La3 Chromosome 3770449 130 0 1 lineage4 Chromosome 206481 5 97 0.049019607843137 lineage4.9 Chromosome 541201 189 0 1

I initially thought that this might be related to the standard deviation of the fractions, but upon further examination, it does not appear to be the case. I would greatly appreciate it if you could provide me with insights into how tbprofiler makes these predictions. Thank you so much.

alantsangmb commented 8 months ago

I would like to mention that the reporting of "La3, lineage3" for SRR23497958 can be found in version 5 and 5.0.1. However, it is worth noting that only lineage 3 is reported in version 4.4.0.