sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.95k stars 493 forks source link

HHSearch failed: hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument #575

Open gioodm opened 8 months ago

gioodm commented 8 months ago

I am trying to run ColabFold using its singularity image on an HPC but for some of the sequences to predict I run into this error:

2024-02-26 11:42:51,536 Query 1/7: jgi_YarliY64008_1_51878_gm4.453_g (length 70) 2024-02-26 11:42:56,142 Could not get MSA/templates for jgi_YarliY64008_1_51878_gm4.453_g: HHSearch failed: stdout:

stderr: hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument

hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument

Traceback (most recent call last): File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 1453, in run = get_msa_and_templates(jobname, query_sequence, a3m_lines, result_dir, msa_mode, use_templates, File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 781, in get_msa_and_templates template_feature = mk_template( File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/batch.py", line 132, in mk_template hhsearch_result = hhsearch_pdb70_runner.query(a3m_lines) File "/usr/local/envs/colabfold/lib/python3.9/site-packages/alphafold/data/tools/hhsearch.py", line 94, in query raise RuntimeError( RuntimeError: HHSearch failed: stdout:

stderr: hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument

hhsearch: Problem with data file. Is the file empty or is another process reading it?: Invalid argument

I have tried rerunning these sequences multiple times but it doesn't look like it makes a difference. I have found similar issues in other users using the singularity image for alphafold: https://github.com/google-deepmind/alphafold/issues/132 https://github.com/google-deepmind/alphafold/issues/897

For some the problem was fixed by using the -DHAVE_AVX2=1 flag when making the cmake call in the Dockerfile (as explained in: https://github.com/soedinglab/hh-suite/issues/282). I wanted to try the same but to my understanding the flag needs to go in when the Docker container is actually being built and I am using the prebuilt container. Can you provide any guidance on how to solve this issue?

LicoriceLin commented 8 months ago

Similar issue when I'm trying to use ffindex_order to construct my own database: image

One weird thing is that the bug occurs only with _hhm.ff{data,index}, not with _a3m.ff{data,index}

ybaeus commented 4 months ago

Any updates on this? I am running the apptainer container (apptainer pull https://depot.galaxyproject.org/singularity/colabfold%3A1.5.5--pyh7cba7a3_2 + pip install --upgrade "jax[cuda]==0.4.25" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html + set up some env paths) and face the error below

`2024-07-02 14:59:46,342 Running colabfold 1.5.5 2024-07-02 14:59:46,749 Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' \ https://stackoverflow.com/questions/78304414/attributeerror-jaxlib-xla-extension-devicelist-object-has-no-attribute-split (mismatch JAX version) 2024-07-02 14:59:46,752 Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory \ regarding google cloud? 2024-07-02 14:59:48,711 Running on GPU 2024-07-02 14:59:50,212 Found 9 citations for tools or databases 2024-07-02 14:59:50,212 Query 1/3: CTX (length 36) 2024-07-02 14:59:50,322 Could not get MSA/templates for CTX: HHSearch failed: \ https://github.com/sokrypton/ColabFold/issues/575 stdout:

stderr: - 14:59:50.321 ERROR: In /opt/conda/conda-bld/hhsuite_1709621322429/work/src/ffindexdatabase.cpp:11: FFindexDatabase: - 14:59:50.321 ERROR: could not open file 'output/CTX_env/templates_101/pdb70_cs219.ffdata' `

ntnn19 commented 3 months ago

For me the problem was solved by placing each pair of m8 and a3m files corresponding to the same query in a separate subdirectory, e.g.:

seqs ├── query1 │ ├── query1.m8 │ └── query1.a3m ├── query2 │ ├── query2.m8 │ └── query2.a3m ├── query3 │ ├── query3.m8 │ └── query3.a3m └── queryN ├── queryN.m8 └── queryN.a3m