Open nick-youngblut opened 4 years ago
Ah I've been meaning to build a database from dbCAN since a while, thanks for the reminder.
I tried to reproduce building the database and it works correctly with the *_mpi
binaries.
Something like this works for me:
DB=dbCAN-fam-V8
wget http://bcb.unl.edu/dbCAN2/download/dbCAN-fam-aln-V8.tar.gz
tar xzvf dbCAN-fam-aln-V8.tar.gz
cd dbCAN-fam-aln;
ffindex_build -s ../${DB}_msa.ff{data,index} .
cd ..
sed 's|\.aln||g' ${DB}_msa.ffindex > ${DB}_msa_renamed.ffindex
mv ${DB}_msa_renamed.ffindex ${DB}_msa.ffindex
mpirun -np 16 ffindex_apply_mpi ${DB}_msa.ffdata ${DB}_msa.ffindex -i ${DB}_a3m.ffindex -d ${DB}_a3m.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0
mpirun -np 16 ffindex_apply_mpi ${DB}_a3m.ff{data,index} -i ${DB}_hhm.ffindex -d ${DB}_hhm.ffdata -- hhmake -i stdin -o stdout -v 0
mpirun -np 16 cstranslate_mpi -x 0.3 -c 4 -I a3m -i ${DB}_a3m -o ${DB}_cs219
# reorder according to cs219 for better access patterns
sort -k 3 -n ${DB}_cs219.ffindex | cut -f1 > ${DB}.list
for type in a3m hhm; do
ffindex_order ${DB}.list ${DB}_${type}.ffdata ${DB}_${type}.ffindex ${DB}_${type}_opt.ffdata ${DB}_${type}_opt.ffindex
mv -f ${DB}_${type}_opt.ffdata ${DB}_${type}.ffdata
mv -f ${DB}_${type}_opt.ffindex ${DB}_${type}.ffindex
done
md5deep ${DB}_{a3m,hhm,cs219}.ff{data,index} > ${DB}.md5sum
tar czvf ${DB}.tar.gz ${DB}_{a3m,hhm,cs219}.ff{data,index} ${DB}.md5sum
I took the liberty to build this database and put it on our file server: http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/dbCAN-fam-V8.tar.gz
I would recommend to search through it with HHsearch instead of HHblits though. Due to it's small size HHsearch can still easily handle it and it will be more sensitive.
Hello? I want to know how you get the *_mpi binaries? The document didn't declare the process of installing hh-suite with MPI support? Could you please tell me how to do it? Thanks! I also met the problem `Reading context library for pseudocounts from context_data.lib ... Reading abstract state alphabet from cs219.lib ...
ERROR: Sequence 1 has 764 match columns but should have 2021! `
I added a section to the wiki: https://github.com/soedinglab/hh-suite/wiki#mpi-support
I think you were missing the -f
or --ffindex
flag of cstranslate
to switch from single file mode to database read in.
That might be what was causing the error message.
I made a new DB for V9: http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/dbCAN-fam-V9.tar.gz
The dbCAN team thankfully provided the raw alignments for the new release.
Expected Behavior
Custom database created for dbCAN v8.
Current Behavior
Error during the
cstranslate
step.Steps to Reproduce (for bugs)
HH-suite Output (for bugs)
If using
cstranslate -x 0.3 -c 4 -I a3m -i dbCAN-fam-aln-V8_a3m -o dbCAN-fam-aln-V8_cs219
:If using
cstranslate -x 0.3 -c 4 -I a3m -i dbCAN-fam-aln-V8_a3m.ffdata -o dbCAN-fam-aln-V8_cs219
:Your Environment
Ubuntu 18.04.4