Closed hartrials closed 3 years ago
This is valid but probably doesn't actually do what you expect it to do.
test.sto
contains multiple MSAs it will discard all but the first.ffindex_from_fasta
will create a separate database entry (search target) for each sequence in the first
MSA, the input for the next tools will thus be sequences and not MSAs anymore.You probably want:
cd directory_containing_many_sto_files
ffindex_build -s ../stodb.ffdata ../stodb.ffindex .
cd ..
ffindex_apply -d first_a3m.ffdata -i first_a3m.ffindex stodb.ffdata stodb.ffindex -- reformat.pl sto a3m - - -v 0
I think you also want -v 0
for hhconsensus
and hhmake
I just want to build custom database to hhblits. Because HMMER can search fasta database not hhsuite database, I use more fasta database to generate more MSAs. But I know little about HMMER parameters. So I want to use the MSAs generated from HMMER to build custom hhsuite database.
By the way, test.sto
file is generated by jackhmmer. I use a single sequence query fasta to search fasta database. And I only generate one sto file.
I edit my commands as follows:
I get the HMMER MSAs and named it test.sto. And the next commands are as follows:
reformat.pl sto a3m test.sto test.a3m
ffindex_build -s new/first_a3m.ffdata new/first_a3m.ffindex test.a3m
ffindex_apply -d new/new_a3m.ffdata -i new/new_a3m.ffindex new/first_a3m.ffdata new/first_a3m.ffindex -- hhconsensus -M 50 -i stdin -oa3m stdout -v 1
ffindex_apply -d new/new_hhm.ffdata -i new/new_hhm.ffindex new/new_a3m.ffdata new/new_a3m.ffindex -- hhmake -i stdin -o stdout -v 1
cstranslate -A cs219_lib -D context_data_lib -x 0.3 -c 4 -f -i new/new_a3m -o new/new_cs219 -I a3m
hhblits -i query.fasta -d new/new -o result.hhr -oa3m result.a3m -cpu 3 -qsc -30 -id 99 -cov 50 -diff inf
I'm sorry I have two more questions.
Thank you very much for your patience in answering my question!
If you have just a single query sequence/MSA and a single target MSA, I would recommend to just use hhalign
. Building a whole database for a single MSA doesn't really make a lot of sense.
The steps in your second post do work though to make a full hhblits database from a single MSA, it's just a bit odd.
The -M
parameter decides which MSA columns are chosen as match states, generally -M 50
works well. It means every column that has less than 50% gaps is a match column. See towards the end of this section: https://github.com/soedinglab/hh-suite/wiki#searching-databases-of-hmms-using-hhsearch-and-hhblits
Thank you very much!
Steps to Reproduce (for bugs)
I plan to use Hmmer software to generate multiple sequence alignment, and then use the results of Hmmer alignment to build a custom database.
My question
Is the way I build a custom database reasonable?
Command
I get the HMMER MSAs and named it test.sto. And the next commands are as follows:
reformat.pl sto a3m test.sto test.a3m
ffindex_from_fasta -s new/first_a3m.ffdata new/first_a3m.ffindex test.a3m
ffindex_apply -d new/new_a3m.ffdata -i new/new_a3m.ffindex new/first_a3m.ffdata new/first_a3m.ffindex -- hhconsensus -M 50 -i stdin -oa3m stdout -v 1
ffindex_apply -d new/new_hhm.ffdata -i new/new_hhm.ffindex new/new_a3m.ffdata new/new_a3m.ffindex -- hhmake -i stdin -o stdout -v 1
cstranslate -A cs219_lib -D context_data_lib -x 0.3 -c 4 -f -i new/new_a3m -o new/new_cs219 -I a3m
hhblits -i query.fasta -d new/new -o result.hhr -oa3m result.a3m -cpu 3 -qsc -30 -id 99 -cov 50 -diff inf
My Environment
Include as many relevant details about the environment you experienced the issue in.