mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
110 stars 27 forks source link

How to annotate significant variants with references database? #77

Closed sekhwal closed 5 years ago

sekhwal commented 5 years ago

Hi, Could you please let me know how I can annotate significant variants with reference database. I have a significant variants file (file 1) generated from pyseer and a uniprot downloaded protein fasta format database (file 2). --------------------Example significant unitigs file format---------------------------- 18 AAATTAGAGGACATACTCGTCCGCCAGCGGCGGGTGGGTAAAATCATTA 19 TGGTGTGCGCTCACGACAGGTAAAAAAAAAACCTGCCAGCGATGGCAGGTTT 20 TGTTACAGATTGATGACCGGCAAAAAAAAAACCTGCGCATCTGCGCAGGCTG

johnlees commented 5 years ago

You could map these to a reference sequence, or your annotated input sequences, both of which are covered here: https://pyseer.readthedocs.io/en/master/usage.html#processing-k-mer-output

To annotated from uniprot you could use blast to do the mapping. There are various guides around, here is the user manual for the command line: https://www.ncbi.nlm.nih.gov/books/NBK279690/