ml4bio / Dense-Homolog-Retrieval

Nature Biotechnology: Ultra-fast, sensitive detection of protein remote homologs using deep dense retrieval
BSD 3-Clause "New" or "Revised" License
59 stars 1 forks source link

Error in faiss #2

Closed nope-sto closed 2 weeks ago

nope-sto commented 1 month ago

I tried to run python do_retrieval.py -i input.fasta and received the following error:

Traceback (most recent call last): File "/home/stcg/Documents/Dense-Homolog-Retrieval/do_retrieval.py", line 64, in index = faiss.read_index(idx_path) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/faiss/swigfaiss_avx2.py", line 10206, in read_index return _swigfaiss_avx2.read_index(args) RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char) at /opt/conda/conda-bld/faiss-pkg_1681998057888/work/faiss/impl/io.cpp:67: Error: 'f' failed: could not open ./output/agg/index-ebd.index for reading: No such file or directory

When running in online mode, do I still need to specify a database path?

heathcliff233 commented 1 month ago

Thanks for trying. Here you need to convert a sequence database into embedding, and then specify the faiss index path. If you just would like a quick test, please switch to v1 branch and there should be a download link for a pre-built one.

nope-sto commented 1 month ago

Thanks for your feedback. So if i want to use other published databases such as NCBI NR or UniProt, I first have to download them, generate a SQL DB and then convert it into embeddings? Is there a way to utilize mmseqs format databases for this purpose? https://github.com/soedinglab/mmseqs2/wiki#downloading-databases

heathcliff233 commented 1 month ago

Yes, you can use mmseqs databases, but only the tsv format ones. MMseqs databases are clustered profile databases, and you may use the raw sequences(very large) or the cluster centers(recommended).

nope-sto commented 1 month ago

Okay great thanks. First I will do a quick test with your pre-built embedding. I have downloaded the two files (scope.faiss and scope.pkl) and placed them in a folder called DB. Then I have executed: python do_retrieval.py -i input.fasta -d DB/ -o test

which resulted in the following error:

Traceback (most recent call last): File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 64, in index = faiss.read_index(idx_path) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/faiss/swigfaiss_avx2.py", line 10206, in read_index return _swigfaiss_avx2.read_index(args) RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char) at /opt/conda/conda-bld/faiss-pkg_1681998057888/work/faiss/impl/io.cpp:67: Error: 'f' failed: could not open DB/index-ebd.index for reading: No such file or directory

heathcliff233 commented 1 month ago

Thank you for testing out and reporting the error. It is caused by naming plus file format issue. I have updated the names of pre-built files in Google Drive. For a temporary fix, would you please make sure you are on the v1 branch and update the file names as well as line https://github.com/ml4bio/Dense-Homolog-Retrieval/blob/413b2380e09535a1b0d49b991c1101ed5e4bdd7e/mydpr/dataset/cath35.py#L60 to uncomment it. These should be fixed recently and I apologize for the inconvenience.

nope-sto commented 1 month ago

I downloaded the new files with correct names and uncommented the self.records line but still get:

Traceback (most recent call last): File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 64, in index = faiss.read_index(idx_path) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/faiss/swigfaiss_avx2.py", line 10206, in read_index return _swigfaiss_avx2.read_index(args) RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char) at /opt/conda/conda-bld/faiss-pkg_1681998057888/work/faiss/impl/io.cpp:67: Error: 'f' failed: could not open DB/index-ebd.index for reading: No such file or directory

Is the index-ebd.index converted into index-ebd.faiss by cath35.py?

heathcliff233 commented 1 month ago

The file surfix should be .index, my fault.

nope-sto commented 1 month ago

After changing it to .index new error appears:

Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/io/pickle.py", line 206, in read_pickle return pickle.load(handles.handle) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 35, in load_reduce stack[-1] = func(*args) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 67, in df = pd.read_pickle(dm_path) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/io/pickle.py", line 211, in read_pickle return pc.load(handles.handle, encoding=None) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 225, in load return up.load() File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/pickle.py", line 1212, in load dispatchkey[0] File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 55, in load_reduce elif args and issubclass(args[0], PeriodArray): TypeError: issubclass() arg 1 must be a class

heathcliff233 commented 1 month ago

Sorry I am not sure what happened here. It seems an issue caused by pandas.read_pickle. The pickle file works on my computer. Would you please leave your specs and I will check it later? As a workaround, I have uploaded a csv file. It should work as a replacement if the line is set to pandas.read_csv

Charlesjc-lab commented 1 month ago

After changing it to .index new error appears:

Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/io/pickle.py", line 206, in read_pickle return pickle.load(handles.handle) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 35, in load_reduce stack[-1] = func(*args) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 67, in df = pd.read_pickle(dm_path) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/io/pickle.py", line 211, in read_pickle return pc.load(handles.handle, encoding=None) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 225, in load return up.load() File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/pickle.py", line 1212, in load dispatchkey[0] File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 55, in load_reduce elif args and issubclass(args[0], PeriodArray): TypeError: issubclass() arg 1 must be a class

There has been the same problem encountered when using the v1 version of the software. Is there a solution you have found?

heathcliff233 commented 2 weeks ago

I have updated the codebase and the main branch should work now. Sorry for the delay.