jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
105 stars 43 forks source link

Lineage typing method in TB-Profiler #160

Closed KevinLYW366 closed 3 years ago

KevinLYW366 commented 3 years ago

Hi Jody,

Thank you for making this useful tool in MTB WGS data analysis!

I have two questions about lineage typing method in TB-Profiler:

  1. In your published paper, Napier, G., Campino, S., Merid, Y. et al. Robust barcoding and identification of Mycobacterium tuberculosis lineages for epidemiological and clinical studies. Genome Med 12, 114 (2020). https://doi.org/10.1186/s13073-020-00817-3, it says the 90 SNPs method has been incorporated into TB-Profiler. Does it mean the lineage typing method in TB-Profiler is a mix of RD-analysis, 90 SNPs and etc.?
  2. I'm interested in this 90 SNPs lineage typing method. Is it possible to type lineage using ONLY the 90 SNPs method in TB-Profiler? I found the option '--snps' in 'tb-profiler lineage' whose help message is '--snps Sample prefix (default: False)'. I'm a little confused.

By the way, I'm using TB-Profiler in Linux CentOS 7 system. TB-Profiler version is 3.0.4 with database tbdb_a2a234b.

Thanks again, Kevin

jodyphelan commented 3 years ago

Hi Kevin,

Does it mean the lineage typing method in TB-Profiler is a mix of RD-analysis, 90 SNPs and etc.?

The 90 SNPs barcode contains 1 SNP per sublineages but each sublineage can have a lot more unique SNPs (full list here). This is useful when designing a lab-based SNP-typing method as you don't need interrogate many positions on the genome. TB profiler uses the SNPs from the publication you linked but actually uses more than 90 SNPs. Because it works with whole genome sequence data and it is fast to do snp calling it analyses up to 10 SNPs per lineage. This avoids potential issues with having low-coverage on some of the SNP sites.

I found the option '--snps' in 'tb-profiler lineage' whose help message is '--snps Sample prefix (default: False)'. I'm a little confused.

Sorry this looks like an error in the help message. The --snps command produces a file which indicates the frequency of the lineage-specific alleles for all positions analysed:

lineage4.1      62657   87      0       1.0
lineage4.1      284623  54      0       1.0
lineage4.1      902413  56      0       1.0
lineage4.1      923065  90      0       1.0
lineage4.1      1875207 37      0       1.0
lineage4.1      2020144 56      0       1.0
lineage4.1      2253453 49      0       1.0
lineage4.1      2574022 57      0       1.0
lineage4.1      2671061 37      0       1.0
lineage4.1      2906978 39      0       1.0

The columns are

  1. Lineage
  2. Genome position
  3. Number of reads with lineage-specific allele
  4. Number of reads with other allele
  5. Fraction fo reads with lineage-specific allele
KevinLYW366 commented 3 years ago

Hi Jody,

Thanks so much for your quick reply! Now I'm clear with the lineage typing part in TB-Profiler.

I'm thinking of other two questions related to drug resistance detection in TB-profiler:

  1. In tbdb.csv, there might be several variants matched with one drug. Is it correct that drug resistance will be detected if any one of those variants is present? Or is there a specific rule to detect drug resistance based on the calling of several variants?
  2. I found in variants report the estimated fraction might be a number like 0.362. Considering that MTB is haploid, is there any good way to interpret this kind of estimated fraction (not close to either 0 or 1)?

Any help will be appreciated, Kevin

jodyphelan commented 3 years ago

No problem,

  1. Yes you are correct. If any of the variants in tbdb.csv are detected the strain will be called as resistant.
  2. This fraction can report mutations which have not fixed in a population. This can occur if you have if a mutation has been reently acquired by the population and has not yet fixed, or if the patient has been infected with multiple strains with different mutations.

Hope that helps, let me know if you have any more questions

KevinLYW366 commented 3 years ago

Got it. Thanks for your answer!

jodyphelan commented 3 years ago

No problem!