Closed nope-sto closed 2 weeks ago
Thanks for trying. Here you need to convert a sequence database into embedding, and then specify the faiss index path. If you just would like a quick test, please switch to v1 branch and there should be a download link for a pre-built one.
Thanks for your feedback. So if i want to use other published databases such as NCBI NR or UniProt, I first have to download them, generate a SQL DB and then convert it into embeddings? Is there a way to utilize mmseqs format databases for this purpose? https://github.com/soedinglab/mmseqs2/wiki#downloading-databases
Yes, you can use mmseqs databases, but only the tsv format ones. MMseqs databases are clustered profile databases, and you may use the raw sequences(very large) or the cluster centers(recommended).
Okay great thanks. First I will do a quick test with your pre-built embedding. I have downloaded the two files (scope.faiss and scope.pkl) and placed them in a folder called DB. Then I have executed: python do_retrieval.py -i input.fasta -d DB/ -o test
which resulted in the following error:
Traceback (most recent call last):
File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 64, in
Thank you for testing out and reporting the error. It is caused by naming plus file format issue. I have updated the names of pre-built files in Google Drive. For a temporary fix, would you please make sure you are on the v1 branch and update the file names as well as line https://github.com/ml4bio/Dense-Homolog-Retrieval/blob/413b2380e09535a1b0d49b991c1101ed5e4bdd7e/mydpr/dataset/cath35.py#L60 to uncomment it. These should be fixed recently and I apologize for the inconvenience.
I downloaded the new files with correct names and uncommented the self.records line but still get:
Traceback (most recent call last):
File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 64, in
Is the index-ebd.index converted into index-ebd.faiss by cath35.py?
The file surfix should be .index, my fault.
After changing it to .index new error appears:
Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/io/pickle.py", line 206, in read_pickle return pickle.load(handles.handle) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 35, in load_reduce stack[-1] = func(*args) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 67, in
Sorry I am not sure what happened here. It seems an issue caused by pandas.read_pickle
. The pickle file works on my computer. Would you please leave your specs and I will check it later?
As a workaround, I have uploaded a csv file. It should work as a replacement if the line is set to pandas.read_csv
After changing it to .index new error appears:
Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/io/pickle.py", line 206, in read_pickle return pickle.load(handles.handle) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 35, in load_reduce stack[-1] = func(*args) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 2400, in new_block return klass(values, ndim=ndim, placement=placement, refs=refs) TypeError: Argument 'placement' has incorrect type (expected pandas._libs.internals.BlockPlacement, got slice)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/stcg/Documents/Dense-Homolog-Retrieval/Dense-Homolog-Retrieval/do_retrieval.py", line 67, in df = pd.read_pickle(dm_path) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/io/pickle.py", line 211, in read_pickle return pc.load(handles.handle, encoding=None) File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 225, in load return up.load() File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/pickle.py", line 1212, in load dispatchkey[0] File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/pandas/compat/pickle_compat.py", line 55, in load_reduce elif args and issubclass(args[0], PeriodArray): TypeError: issubclass() arg 1 must be a class
There has been the same problem encountered when using the v1 version of the software. Is there a solution you have found?
I have updated the codebase and the main branch should work now. Sorry for the delay.
I tried to run python do_retrieval.py -i input.fasta and received the following error:
Traceback (most recent call last): File "/home/stcg/Documents/Dense-Homolog-Retrieval/do_retrieval.py", line 64, in
index = faiss.read_index(idx_path)
File "/home/stcg/miniconda3/envs/fastMSA/lib/python3.9/site-packages/faiss/swigfaiss_avx2.py", line 10206, in read_index
return _swigfaiss_avx2.read_index(args)
RuntimeError: Error in faiss::FileIOReader::FileIOReader(const char) at /opt/conda/conda-bld/faiss-pkg_1681998057888/work/faiss/impl/io.cpp:67: Error: 'f' failed: could not open ./output/agg/index-ebd.index for reading: No such file or directory
When running in online mode, do I still need to specify a database path?