steineggerlab / foldseek

Foldseek enables fast and sensitive comparisons of large structure sets.
https://foldseek.com
GNU General Public License v3.0
806 stars 100 forks source link

The database generated with ProstT5 based on fasta files is incomplete #378

Closed chengaoxiang1985 closed 1 hour ago

chengaoxiang1985 commented 2 hours ago

The database generated with ProstT5 based on fasta files is not complete, at least it does not contain the "XXX_ca" file。If the database is built with the pdb format files as input, there is no problem, the complete database will be generated, including the "XXX_ca" file.A complete database is required, otherwise the structure search will result in an error message that the file "XXX_ca" cannot be found.

chengaoxiang1985 commented 1 hour ago

This is my code to creat the database:

foldseek createdb query_data_cgx/QUERY.fasta database_model_cgx/my_database/DB_cgx --prostt5-model database_model_cgx/prostt5-f16-safetensors.

Followging are all the files included in the newly created database:

image

Followging are all the files included in the database 'pdb100' downloaded from this Github, clearly some XXX_ca files are contained: image

chengaoxiang1985 commented 1 hour ago

找到答案了: https://github.com/steineggerlab/foldseek/issues/305

xjhzjucas commented 33 minutes ago

The database generated with ProstT5 based on fasta files is not complete, at least it does not contain the "XXX_ca" file。If the database is built with the pdb format files as input, there is no problem, the complete database will be generated, including the "XXX_ca" file.A complete database is required, otherwise the structure search will result in an error message that the file "XXX_ca" cannot be found.

您好,想问一下您可以通过foldseek databases命令成功下载各个数据库吗?好像有一些墙的问题,不知道能否解决,谢谢!

chengaoxiang1985 commented 13 minutes ago

这个问题我们也有,也是翻墙下载后手动上传过去的。

xjhzjucas commented 11 minutes ago

好的感谢!有一个疑问是直接手动上传后,不知道需不需要建索引等其他操作呀?因为感觉foldseek databases命令应该不仅仅是下载文件,不知道您有没有经验

chengaoxiang1985 commented 4 minutes ago

可以解压放在任何地方,只要在用的时候放上绝对路径,如果在当前路径下,则以当前路径开头补上下载到的文件的路径即可,完全不影响结果。