Open mbhall88 opened 1 year ago
I've also notice you don't accept duplications in the recommended format? i.e. c.643dup
must specify the duplicated base at the end e.g., c.643dupC
Hi @mbhall88 ,
Sorry I need to update the documentation. You are right in using tb-profiler create_db
instead.
As per the docs, the mutations must follow HGVS nomenclature. But it seems tb-profiler only accepts a subset of this nomenclature. For example, I have the mutation c.196_198delinsTAG, which describes an MNP at position 196 TCG>TAG. Looking at the tbdb.conversion.log this (incorrectly) gets converted.
Yes at the moment it is only a subset, which it accepts. The pipeline uses snpEff to annotate variants in new samples and only represents the variants in one way (e.g. c.643dupC instead c.643dup). To simplify the variant looup step the create_db function tried to standardise all variants to the snpEff format using regex, but currently I've only added support for the variants that are tbdb.csv
. I'll try over the next days to update the docs and look into adding compatibility for more types such as the one you listed.
Thanks for raising the issue!
Thanks for the clarification. Trying to support all of HGVS would likely be difficult, and would likely require developing a library. I just noticed https://github.com/biocommons/hgvs though! I haven't used it before, but looks like it might make your life a little easier potentially?
Anyways, I got a custom db working and just thought this issue might be helpful just for some docs changes.
Thanks for the quick response.
Oh I hadn't seen that before, I'll check it out thanks! And, I'll have a go at updating the docs asap.
I'm having some issues trying to create a custom database.
My understanding from the documentation is that I clone this repo, and then replace/change the
tbdb.csv
file to have the mutations I want, then I runparse_db.py
in the main directory?It seems there is a file missing? And I can't find it documented anywhere
I then instead tried running the following from the tbdb main directory
this completes successfully, but I have a further issue with the output of this.
As per the docs, the mutations must follow HGVS nomenclature. But it seems tb-profiler only accepts a subset of this nomenclature.
For example, I have the mutation
c.196_198delinsTAG
, which describes an MNP at position 196TCG>TAG
. Looking at thetbdb.conversion.log
this (incorrectly) gets converted asAre you able to clarify (here and in the docs) what subset you support?