uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
306 stars 67 forks source link

Cannot find 22:32:35.465 /RF2NA/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata #83

Open BrandonFrenz opened 6 months ago

BrandonFrenz commented 6 months ago

I'm trying to test RF2NA on a restriction enzyme (PDBID 3e42) however whenever I run the MSA generation step I got an error message: 22:32:35.465 ERROR: could not open file '/RF2NA/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata'

It's true that this file does not exist. I followed the download instructions for the bfd from the github page. The only file I have in the bfd database is: bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata

Am I missing a piece of the database? Is there a versioning issue? I haven't been able to figure it out.

BrandonFrenz commented 6 months ago

This problem only seems to arise when hhblits fails to find enough hits on the uniref database and tries to do an extended search on the BFD database. Removing lines 57-83 in the "make_protein_msa.sh" script in the input_prep directory resolves this issue and the model I got out was still of good quality. However presumably that code is there for a reason and this issue should still be resolved properly.

Yuxi778 commented 4 months ago

I have 6 different files in bfd database. There are bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata, bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex, bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata, bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex, bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata, bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex. Hopefully it will be helpful.