Open DaMaoShan opened 5 years ago
Hi, yes, I build my own version of the pdb database (pdb_full) in the context of PSSH2 (database of sequence to structure alignments). I use the AWS cloud to run the steps (see https://github.com/aschafu/PSSH2/tree/master/src/cloud), but I guess you can extract the important bits and rework this for your problem:
/usr/share/hhsuite/bin/ffindex_build pdb_full_a3m.ffdata pdb_full_a3m.ffindex a3m/
/usr/share/hhsuite/bin/ffindex_build pdb_full_hhm.ffdata pdb_full_hhm.ffindex hhm/
LC_ALL=C sort pdb_full_hhm.ffindex > pdb_full_hhm.ffindex.simpleSort
LC_ALL=C sort pdb_full_a3m.ffindex > pdb_full_a3m.ffindex.simpleSort
mv pdb_full_a3m.ffindex pdb_full_a3m.ffindex.orig
mv pdb_full_hhm.ffindex pdb_full_hhm.ffindex.orig
ln -s pdb_full_a3m.ffindex.simpleSort pdb_full_a3m.ffindex
ln -s pdb_full_hhm.ffindex.simpleSort pdb_full_hhm.ffindex
export OMP_NUM_THREADS=$(nproc)
/usr/share/hhsuite/bin/cstranslate -A /usr/share/hhsuite/data/cs219.lib -D /usr/share/hhsuite/data/context_data.lib -x 0.3 -c 4 -f -i pdb_full_a3m -o pdb_full_cs219 -I a3m -b
tar -h --transform "s,^,pdb_full_$dbDate/," --show-transformed-names -cvzf pdb_full_$dbDate.tgz pdb_full_a3m.ffdata pdb_full_a3m.ffindex pdb_full_hhm.ffdata pdb_full_hhm.ffindex pdb_full_cs219.ffdata pdb_full_cs219.ffindex
Hi, yes, I build my own version of the pdb database (pdb_full) in the context of PSSH2 (database of sequence to structure alignments). I use the AWS cloud to run the steps (see https://github.com/aschafu/PSSH2/tree/master/src/cloud), but I guess you can extract the important bits and rework this for your problem:
- run hhblits on each of your sequences to generate a3m and hhm files
- assemble all the data (assuming all a3m files are in directory a3m/, all hhms in directory hhm/):
/usr/share/hhsuite/bin/ffindex_build pdb_full_a3m.ffdata pdb_full_a3m.ffindex a3m/ /usr/share/hhsuite/bin/ffindex_build pdb_full_hhm.ffdata pdb_full_hhm.ffindex hhm/ LC_ALL=C sort pdb_full_hhm.ffindex > pdb_full_hhm.ffindex.simpleSort LC_ALL=C sort pdb_full_a3m.ffindex > pdb_full_a3m.ffindex.simpleSort mv pdb_full_a3m.ffindex pdb_full_a3m.ffindex.orig mv pdb_full_hhm.ffindex pdb_full_hhm.ffindex.orig ln -s pdb_full_a3m.ffindex.simpleSort pdb_full_a3m.ffindex ln -s pdb_full_hhm.ffindex.simpleSort pdb_full_hhm.ffindex export OMP_NUM_THREADS=$(nproc) /usr/share/hhsuite/bin/cstranslate -A /usr/share/hhsuite/data/cs219.lib -D /usr/share/hhsuite/data/context_data.lib -x 0.3 -c 4 -f -i pdb_full_a3m -o pdb_full_cs219 -I a3m -b tar -h --transform "s,^,pdb_full_$dbDate/," --show-transformed-names -cvzf pdb_full_$dbDate.tgz pdb_full_a3m.ffdata pdb_full_a3m.ffindex pdb_full_hhm.ffdata pdb_full_hhm.ffindex pdb_full_cs219.ffdata pdb_full_cs219.ffindex
Very thanks for you. I will try it as soon as possible.
Hi, yes, I build my own version of the pdb database (pdb_full) in the context of PSSH2 (database of sequence to structure alignments). I use the AWS cloud to run the steps (see https://github.com/aschafu/PSSH2/tree/master/src/cloud), but I guess you can extract the important bits and rework this for your problem:
- run hhblits on each of your sequences to generate a3m and hhm files
- assemble all the data (assuming all a3m files are in directory a3m/, all hhms in directory hhm/):
/usr/share/hhsuite/bin/ffindex_build pdb_full_a3m.ffdata pdb_full_a3m.ffindex a3m/ /usr/share/hhsuite/bin/ffindex_build pdb_full_hhm.ffdata pdb_full_hhm.ffindex hhm/ LC_ALL=C sort pdb_full_hhm.ffindex > pdb_full_hhm.ffindex.simpleSort LC_ALL=C sort pdb_full_a3m.ffindex > pdb_full_a3m.ffindex.simpleSort mv pdb_full_a3m.ffindex pdb_full_a3m.ffindex.orig mv pdb_full_hhm.ffindex pdb_full_hhm.ffindex.orig ln -s pdb_full_a3m.ffindex.simpleSort pdb_full_a3m.ffindex ln -s pdb_full_hhm.ffindex.simpleSort pdb_full_hhm.ffindex export OMP_NUM_THREADS=$(nproc) /usr/share/hhsuite/bin/cstranslate -A /usr/share/hhsuite/data/cs219.lib -D /usr/share/hhsuite/data/context_data.lib -x 0.3 -c 4 -f -i pdb_full_a3m -o pdb_full_cs219 -I a3m -b tar -h --transform "s,^,pdb_full_$dbDate/," --show-transformed-names -cvzf pdb_full_$dbDate.tgz pdb_full_a3m.ffdata pdb_full_a3m.ffindex pdb_full_hhm.ffdata pdb_full_hhm.ffindex pdb_full_cs219.ffdata pdb_full_cs219.ffindex
I am very sorry that could you tell me how many sequences do you have in your protein fasta file? Thank you in advance!
Sorry, had overlooked the mails.
I am very sorry that could you tell me how many sequences do you have in your protein fasta file? I am not sure which fasta file you mean. But I guess you want to know how many sequences my database contains (in my setup number of files in the a3m and hhm directories)? That is on the order of 100k.
Sorry, had overlooked the mails.
I am very sorry that could you tell me how many sequences do you have in your protein fasta file? I am not sure which fasta file you mean. But I guess you want to know how many sequences my database contains (in my setup number of files in the a3m and hhm directories)? That is on the order of 100k.
======================================== Sorry! Recently I had focused on another projector. Today I restart trying to make my own hh-database. My fasta file is env_nr.gz from ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/. This is on the order of 10^6
Hello, did you solve this issue?
Anyone tried building their own databases for hhblits ? I mean that from a protein fasta file to hhblits database.
I am using the pipeline listed in wiki of hhsuite: https://github.com/soedinglab/uniclust-pipeline
But it seems that there are lots of bugs.
Thanks in advance!