ml4bio / Dense-Homolog-Retrieval

Nature Biotechnology: Ultra-fast, sensitive detection of protein remote homologs using deep dense retrieval
BSD 3-Clause "New" or "Revised" License
93 stars 2 forks source link

missing index file #20

Closed daisykuma22 closed 2 months ago

daisykuma22 commented 2 months ago

Dear Developers,

I hope this message finds you well.

I have been using the do_embedding.py script to generate embeddings for my own sequences with the following command:

python3 do_embedding.py trainer.ur90_path=/public/home/DHR/DB/ros.tsv model.ckpt_path=/public/home/software/Dense-Homolog-Retrieval/checkpointhydra.run.dir=/public/home/DHR/DB **Below is the output log** Globalseed set to 1234 /public/home/software/Dense-Homolog-Retrieval/checkpoint [2024-09-13 00:53:54,024][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmps_1fek_t [2024-09-13 00:53:54,025][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmps_1fek_t/_remote_module_non_sriptable.py GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Missing logger folder: /public/home/DHR/DB/lightning_logs /public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:659: UserWarning: Yourpredict_dataloaderhasshuffle=True, it is strongly recommended that you turn this off for val/test/predict dataloaders. rank_zero_warn( Predicting: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 102/102 [01:42<00:00, 1.00s/it]

However, I encountered an issue during homolog retrieval using the do_retrieval.py script. Specifically, the process fails with an error indicating that there is "No index file." Here’s the command I ran for retrieval:

python3 do_retrieval.py -i /public/home/software/Dense-Homolog-Retrieval/example/df-ebd.tsv -d /public/home/DHR/DB -o/public/home/DHR/output/`

The error message I received is: Traceback (most recent call last): File "do_retrieval.py", line 65, in index = faiss.read_index(idx_path) File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 8393, in read_index return _swigfaiss_avx2.read_index(args) RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char) at /opt/conda/conda-bld/faiss-pkg_1639741038719/work/faiss/impl/io.cpp:67: Error: 'f' failed: could not open /public/home/DHR/DB/index-ebd.index for reading: No such file or directory ` It appears that the indexing step might not have completed properly, as there is no index file in the output directory. The output generated from the Offline Embedding step only includes a predictions.pt file within the ebd directory. 1726197070484

Could you kindly provide guidance on how to address this problem? I look forward to your assistance.

heathcliff233 commented 2 months ago

Thanks for trying. Please refer to the offline embedding section step 2 and make sure to aggregate the result. The first step will only generate the embedding without creating the corresponding index file.