Open BrennicaMarlow opened 5 years ago
I want to use hhblits to make a multiple sequence alignment using only eukaryote sequences. Is there a way to get only the eukaryote sequences from the uniclust database.
Hi BrennicaMarlow and All who are reading this,
Actually, I want to do the same with proteins form dsDNA viruses. The best (partial) answer, that I have so far, is to use ffindex_get
utility (comes together with the hhsuite-3.2.0) to parse the UniRef30_2020_02_a3m.ffdata by their indices and retrieve the alignment that correspond to specific organism. Something like this
$ ffindex_get UniRef30_2020_02_a3m.ffdata UniRef30_2020_02_a3m.ffindex 110848668 110849024 110850663 110850770 11085238
Then, recalculate HMMs and context states on these sub-alignments with hhmake
and cstranslate
, correspondingly, and generally follow the guidelines for building customized alignments from MSAs.
The problem here, however, is that there is no correspondence between database index in ffindex file (those 110848668 110849024 110850663 110850770 11085238 in the command above) and taxonomic group. I imagine, it's possible to write a script that will establish this correspondence, because, if you check the headers of sequences in UniRef30_2020_02_a3m.ffdata, you will notice that they contain TaxID="NCBI Taxonomy ID". But maybe dear developers can advise us a better way to solve this problem
Best regards, Danyil
I want to use hhblits to make a multiple sequence alignment using only eukaryote sequences. Is there a way to get only the eukaryote sequences from the uniclust database.
:exclamation: Make to check out our User Guide.
Expected Behavior
Current Behavior
Steps to Reproduce (for bugs)
Please make sure to execute the reproduction steps.
HH-suite Output (for bugs)
Please make sure to post the complete output of the tool you called. Please use gist.github.com.
Context
Providing context helps us come up with a solution and improve our documentation for the future.
Your Environment
Include as many relevant details about the environment you experienced the issue in.