Open vrennie opened 5 months ago
It uses the following variables in the format column to get the frequency of the read:
But if these are not available we could probably bypass calculation of frequency and filtering
indeed these are not available unless we pull it from another source, which I think would be rather cumbersome.
How do we bypass the calculation?
I'd have to add that to the processing login in pathogen-profiler. How are you calling the insertion?
Using ISMapper (https://github.com/jhawkey/IS_mapper) to generate a text file and then a custom script to generate a vcf from that for interpretation by tb-profiler.
@jodyphelan I did a basic test to generate a vcf, containing minimal information. To test that tb-profiler reads it correctly I started by putting a regular SNP in the Rv0678 gene. However, this generated a json with 1 total variant (expected) but not interpreted by tb-profiler (unexpected). Could you just check you get the same on your side? IS_NICD_test.vcf.zip
@jodyphelan we have some indications that 1S6110 insertions may be involved in BDQ resistance
We have some ideas on how to solve this but we have some questions in terms of constructing a vcf that could be readable by TBProfiler
1) What is the minimum columns that TBProfiler needs in the vcf? Can the INFO/QUAL/FILTER/FORMAT columns be empty
2) Do you think that TBprofiler would be able to deal with the following VCF (see attached) or would this format cause issues?