jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
105 stars 43 forks source link

TBProfiler Dealing with 1S6110 insertions #374

Open vrennie opened 5 months ago

vrennie commented 5 months ago

@jodyphelan we have some indications that 1S6110 insertions may be involved in BDQ resistance

We have some ideas on how to solve this but we have some questions in terms of constructing a vcf that could be readable by TBProfiler

1) What is the minimum columns that TBProfiler needs in the vcf? Can the INFO/QUAL/FILTER/FORMAT columns be empty

2) Do you think that TBprofiler would be able to deal with the following VCF (see attached) or would this format cause issues?

Screenshot 2024-06-27 at 09 05 52
jodyphelan commented 5 months ago

It uses the following variables in the format column to get the frequency of the read:

FORMAT=

FORMAT=

FORMAT=

FORMAT=

But if these are not available we could probably bypass calculation of frequency and filtering

vrennie commented 5 months ago

indeed these are not available unless we pull it from another source, which I think would be rather cumbersome.

How do we bypass the calculation?

jodyphelan commented 5 months ago

I'd have to add that to the processing login in pathogen-profiler. How are you calling the insertion?

vrennie commented 5 months ago

Using ISMapper (https://github.com/jhawkey/IS_mapper) to generate a text file and then a custom script to generate a vcf from that for interpretation by tb-profiler.

vrennie commented 5 months ago

@jodyphelan I did a basic test to generate a vcf, containing minimal information. To test that tb-profiler reads it correctly I started by putting a regular SNP in the Rv0678 gene. However, this generated a json with 1 total variant (expected) but not interpreted by tb-profiler (unexpected). Could you just check you get the same on your side? IS_NICD_test.vcf.zip