soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
529 stars 133 forks source link

Missing Entry Error during the generation of the database. #77

Open akaped opened 6 years ago

akaped commented 6 years ago

hh-suite version 2f74d30.

Firstly I've converted my FASTA alignments to a3m with: hhconsensus -M first -I path_to_alignment_file1 -o path_to_reformatted_1.a3m The software processed all my alignments into a3m without any complains.

Then I've tried to generate a database of my alignment files in this way:

My python script:

path_to_reformatted="/home/pboccaletto/temp/RRMdbs_exp/a3m_rrm/" path_to_create_db="/home/pboccaletto/temp/hh-suite/scripts/hhsuitedb.py" path_to_db="/home/pboccaletto/temp/RRMdbs_exp/mydb.db"

def generate_db():
    print("GENERATING THE DATABASE")
    cmd="python %s -o %s --cpu 1 --ia3m=%s/* --force" %(path_to_create_db,path_to_db,path_to_reformatted)
    try:
        os.system(cmd)
    except:
        print("FAILURE")
    print("GENERATION Completed")

The results from the execution of hhsuitedb.py are the following: (I'm posting just few lines) a full log is attached in this post.

A5.a3m  12512012    22811   182 0
A6.a3m  12534823    11538   717 0
A7.a3m  12546361    9783    177 0
WARNING: Missing entry 361.a3m in /home/pboccaletto/temp/RRMdbs_exp/mydb.db_cs219.ff{data,index}!
WARNING: Missing entry 365.a3m in /home/pboccaletto/temp/RRMdbs_exp/mydb.db_cs219.ff{data,index}!
WARNING: Missing entry 362.a3m in /home/pboccaletto/temp/RRMdbs_exp/mydb.db_cs219.ff{data,index}!

Some of them are processed correctly, while others return this error:

WARNING: Missing entry 361.a3m in /.../mydb.db_cs219.ff{data,index}!

Do you have any suggestion on how I can resolve this problem? I attached here also all the a3m files that I've generated by conversion of my fasta.

a3m.zip

results.txt

aschafu commented 6 years ago

Milot told me to not use hhsuitedb.py, but rather build the database "by hand". I my case, I have a3m and hhm files in an a3m and hhm directory and my result database is called pdb_full:

/usr/share/hhsuite/bin/ffindex_build pdb_full_a3m.ffdata pdb_full_a3m.ffindex a3m/
/usr/share/hhsuite/bin/ffindex_build pdb_full_hhm.ffdata pdb_full_hhm.ffindex hhm/
LC_ALL=C sort pdb_full_hhm.ffindex > pdb_full_hhm.ffindex.simpleSort
LC_ALL=C sort pdb_full_a3m.ffindex > pdb_full_a3m.ffindex.simpleSort
mv pdb_full_a3m.ffindex pdb_full_a3m.ffindex.orig
mv pdb_full_hhm.ffindex pdb_full_hhm.ffindex.orig
ln -s pdb_full_a3m.ffindex.simpleSort pdb_full_a3m.ffindex
ln -s pdb_full_hhm.ffindex.simpleSort pdb_full_hhm.ffindex
export OMP_NUM_THREADS=$(nproc)
/usr/share/hhsuite/bin/cstranslate  -A /usr/share/hhsuite/data/cs219.lib -D /usr/share/hhsuite/data/context_data.lib -x 0.3 -c 4 -f -i pdb_full_a3m -o pdb_full_cs219 -I a3m -b

That seems to work fine. However, I get the empty alignments I mentioned in a separate issue:https://github.com/soedinglab/hh-suite/issues/83 In your case, you'd have to add converting the a3m alignments to hhm, I believe.

Good luck!

ahcm commented 6 years ago

If you use -s with ffindex_build you can save some lines, as it sorts the index already.