Open mbhall88 opened 2 weeks ago
Hi Michael
Apologies for the awful documentation, I really need to invest some time into improving them! I will try to put together a section on what it looks for in a VCF.
Yes the default database uses 'Chromosome' as the chromosome name. If you would like to use your VCFs with a different chromosome name then I would recommend doing --match_ref </path/to/your/refrence.fasta>
in update_db
or create_db
which will use whatever name is in your own fasta file. Again as you pointed out this isn't very clear, so I'll try maybe make a little decision tree figure on datainputs and recomended settings.
The fact it doesn't complain when you feed it a VCF with different chrom names is pretty critical! I'll put in a fix for that and make a new release asap!
And I didn't know abut --rename-chrs
section on bcftools, I'm using my own hacky script internally but this is far more elegant!
No worries. It's hard to keep docs updated as a tool evolves.
Personally, just renaming the chrom in the VCF as I outlined above is probably an easier route than updating the DB. It's also totally fine to expect users to do this, and I guess I kind of created this issue to show an example pf how I achieved it. Selfishly for future me, but hopefully others find it useful. Also, feel free to use it in the docs if you think it is helpful.
Thanks again for keeping TBProfiler updated and evolving.
Hi Jody,
I have just been running tbprofiler with some samples using VCF as the input (it is ONT data I have variant-called with Clair3). Forgive me if I have missed it somewhere but there doesn't seem to be any documentation about what is expected of the VCF?
For future me (and maybe others) the VCF needs to be indexable - i.e., BGZIP-compressed VCF (
.vcf.gz
) or BCF. And the other thing which I found a little more sinister was that the CHROM names must beChromosome
. I had them asNC_000962.3
and tbprofiler ran without any errors, but I essentially got not resistance predictions. When I changed the CHROM name in the VCF I got the expected predictions.My hacky/fast way of making this change was
and then run tbprofiler with
-v out.bcf
.I guess a more robust solution would be to use BCFtools
Anyway, maybe some of these examples could be added to the docs? I know I would find it useful, so maybe others would too?