I have been using the do_embedding.py script to generate embeddings for my own sequences with the following command:
python3 do_embedding.py trainer.ur90_path=/public/home/DHR/DB/ros.tsv model.ckpt_path=/public/home/software/Dense-Homolog-Retrieval/checkpointhydra.run.dir=/public/home/DHR/DB **Below is the output log** Globalseed set to 1234 /public/home/software/Dense-Homolog-Retrieval/checkpoint [2024-09-13 00:53:54,024][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmps_1fek_t [2024-09-13 00:53:54,025][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmps_1fek_t/_remote_module_non_sriptable.py GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Missing logger folder: /public/home/DHR/DB/lightning_logs /public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:659: UserWarning: Yourpredict_dataloaderhasshuffle=True, it is strongly recommended that you turn this off for val/test/predict dataloaders. rank_zero_warn( Predicting: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 102/102 [01:42<00:00, 1.00s/it]
However, I encountered an issue during homolog retrieval using the do_retrieval.py script. Specifically, the process fails with an error indicating that there is "No index file." Here’s the command I ran for retrieval:
The error message I received is:
Traceback (most recent call last):
File "do_retrieval.py", line 65, in
index = faiss.read_index(idx_path)
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 8393, in read_index
return _swigfaiss_avx2.read_index(args)
RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char) at /opt/conda/conda-bld/faiss-pkg_1639741038719/work/faiss/impl/io.cpp:67: Error: 'f' failed: could not open /public/home/DHR/DB/index-ebd.index for reading: No such file or directory
`
It appears that the indexing step might not have completed properly, as there is no index file in the output directory. The output generated from the Offline Embedding step only includes a predictions.pt file within the ebd directory.
Could you kindly provide guidance on how to address this problem?
I look forward to your assistance.
Thanks for trying. Please refer to the offline embedding section step 2 and make sure to aggregate the result. The first step will only generate the embedding without creating the corresponding index file.
Dear Developers,
I hope this message finds you well.
I have been using the do_embedding.py script to generate embeddings for my own sequences with the following command:
python3 do_embedding.py trainer.ur90_path=/public/home/DHR/DB/ros.tsv model.ckpt_path=/public/home/software/Dense-Homolog-Retrieval/checkpoint
hydra.run.dir=/public/home/DHR/DB**Below is the output log**
Globalseed set to 1234 /public/home/software/Dense-Homolog-Retrieval/checkpoint [2024-09-13 00:53:54,024][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmps_1fek_t [2024-09-13 00:53:54,025][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmps_1fek_t/_remote_module_non_sriptable.py GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Missing logger folder: /public/home/DHR/DB/lightning_logs /public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:659: UserWarning: Your
predict_dataloaderhas
shuffle=True, it is strongly recommended that you turn this off for val/test/predict dataloaders. rank_zero_warn( Predicting: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 102/102 [01:42<00:00, 1.00s/it]
However, I encountered an issue during homolog retrieval using the do_retrieval.py script. Specifically, the process fails with an error indicating that there is "No index file." Here’s the command I ran for retrieval:
python3 do_retrieval.py -i /public/home/software/Dense-Homolog-Retrieval/example/df-ebd.tsv -d /public/home/DHR/DB -o
/public/home/DHR/output/`The error message I received is: Traceback (most recent call last): File "do_retrieval.py", line 65, in
index = faiss.read_index(idx_path)
File "/public/software/miniconda3/envs/fastMSA/lib/python3.8/site-packages/faiss/swigfaiss_avx2.py", line 8393, in read_index
return _swigfaiss_avx2.read_index(args)
RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char) at /opt/conda/conda-bld/faiss-pkg_1639741038719/work/faiss/impl/io.cpp:67: Error: 'f' failed: could not open /public/home/DHR/DB/index-ebd.index for reading: No such file or directory
`
It appears that the indexing step might not have completed properly, as there is no index file in the output directory. The output generated from the Offline Embedding step only includes a predictions.pt file within the ebd directory.
Could you kindly provide guidance on how to address this problem? I look forward to your assistance.