labstructbioinf / pLM-BLAST

Detection of remote homology by comparison of protein language model representations
https://toolkit.tuebingen.mpg.de/tools/plmblast
MIT License
45 stars 5 forks source link

emb files point to wrong direction #54

Closed dc2211 closed 1 month ago

dc2211 commented 1 month ago

Hello!

first of all, thank for the great work you have done!

I'm trying to query the rossmandb example agains a test database. The current database is located in database_test/ with files 0.emb, 1.emb, and database.fas. I got my query embeddings through

python embeddings.py start examples/data/input/rossmannsdb.fas ross.pt --gpu

which got me files ross.pt and ross.pt.csv.

Running the single query search through

python scripts/plmblast.py database_test/ ross.pt test.out -cpc 70 -win 49 -span 49 --verbose

leads to the following error:

python scripts/plmblast.py database_test/ ross.pt test.out -cpc 70 -win 49 -span 49 --verbose
using all 1 CPU cores
num cores:  1
loaded database: database_test/ - in dir mode
/home/projects/pLM-BLAST/alntools/filehandle.py:135: UserWarning: Id column is not unique, using index as id
  warnings.warn("Id column is not unique, using index as id")
loaded query: ross.pt - in dir mode
Pre-screening skipped
searching for alignments:   0%|                                                 | 0/30 [00:00<?, ?it/s]

Traceback (most recent call last):
  File "/home/projects/pLM-BLAST/scripts/plmblast.py", line 90, in <module>
    for itr, (query_index, embedding_index, query_emb, embedding_list) in enumerate(batch_loader):
  File "/home/projects/pLM-BLAST/alntools/filehandle.py", line 221, in __next__
    qembedding = self._load_single(self.queryfiles[qdata.qid]).pop()
  File "/home/projects/pLM-BLAST/alntools/filehandle.py", line 280, in _load_single
    emb = torch.load(f)
  File "/home/conda_envs/plmblast/lib/python3.10/site-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/conda_envs/plmblast/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/conda_envs/plmblast/lib/python3.10/site-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
NotADirectoryError: [Errno 20] Not a directory: 'ross.pt/0.emb'

Can't find where the mistake of looking for the .emb files, using the query path, is.

Any help is greatly appreciated. Thanks!

Argusmocny commented 1 month ago

Hi @dc2211 the db argument in plmblast.py should be always a directory . Running

python embeddings.py start examples/data/input/rossmannsdb.fas ross --gpu --asdir

and then

python scripts/plmblast.py database_test/ ross test.csv -cpc 70 -win 49 -span 49 --verbose

should work. In addition I will make the error message more informative.

dc2211 commented 1 month ago

thank you for the help! issue solved.