rockt / SETH

SNP Extraction Tool for Human Variations
rockt.github.com/SETH
Other
27 stars 16 forks source link

Map variants down to chromosomal location + allele #15

Open jhkbg opened 7 years ago

jhkbg commented 7 years ago

Just an idea for next steps: SETH could, starting from an amino acid change, compute the underlying CDS and DNA change(s) and return them as well. This makes it much easier to integrate SETH output across millions of papers to search for specific variants (BRAF V600E is the same as BRAF c.1779T>A is the same as chr7:140453136A>T).

In particular, this would be great to annotate VCFs with papers, for which we will need the exact allele(s). Note that an amino acid change can arise from many DNA changes. Obviously, we should occam the possibilities to only the most likely ones, preferring SNVs over MNVs over complicated insdels, for example.

We probably don't have to re-invent that, but rather write wrappers around tools such as transvar (http://transvar.readthedocs.io) or Counsyl HGVS (https://github.com/counsyl/hgvs). I've had good experience with transvar. They are all in Python though.

I'll update here ones I'm done wrapping transvar around SETH output. My current pipeline is 1) run SETH NER, bulk import to MySQL DB; 2) run SETH NEI with dbSNP 147 on MySQL; 3) export all amino acid variants (CDS TBD) and annotate with Transvar; 4) load back into MySQL 😆