psipred / merizo_search

Fast structure embedding search tool for Merizo
GNU General Public License v3.0
4 stars 2 forks source link

Am I able to run this on my own computer? currently getting database file error #3

Closed kiranmmmmm closed 1 week ago

kiranmmmmm commented 1 month ago

Hi, I am an Honours student and I'm planning to create my own comparison between foldclass and foldseek searches across similar databases for the protein I'm researching. Currently I cloned the repository to run the code, but I ran into a problem I have no idea how to fix.

code input is as follows:

python merizo_search/merizo.py search ../fold_mmrn1_emi_fucptm_2024_07_23_10_35_model_0.cif ../examples/database/cath ../examples/results tmp   

However it gets to this point and runs into the error:


2024-07-23 13:05:38,600 | INFO | Starting merizo search with command: 

merizo_search/merizo.py search ../fold_mmrn1_emi_fucptm_2024_07_23_10_35_model_0.cif ../examples/database/cath ../examples/results tmp

2024-07-23 13:05:38,601 | ERROR | Cannot find database file ../examples/database/cath.pt

The .pt file is clearly there (the hyperlink in the error warning links t the file just fine) but it cannot be accessed by the code at this point. Pls halp ;_;

kiranmmmmm commented 1 month ago

Update: I understand now why it wasn't working; the file wasn't populated.

How can I go about converting one of the Alphafold databases into .index and .pt file format to run my own search? If you know how I would be most grateful for the information :)

shaunmk commented 1 month ago

The repo is in a state of flux at the moment as we are preparing a major update to it. Nevertheless I've just tried using the symlink as you described with the examples and it works fine on my system. Please note that the program currently only supports query structures in PDB format, not mmCIF, so please convert them first. We also only read ATOM records for protein residues.

To answer your second question, you can use the createdb module as described in the README to make your own database from any set of PDB files, though you will run out of memory on a very large set. You will also want to chop up the structures into domains first, as Foldclass is trained on domains rather than full chains.

We will be making a (very large) database of domains from the AFDB available for searching with merizo_search soon; stay tuned.