Error: no GNN package file or directory specified

kliment-olechnovic / ftdmp

FTDMP is a software system for running docking experiments and scoring/ranking multimeric models.

https://kliment-olechnovic.github.io/ftdmp/

MIT License

8 stars 1 forks source link

Error: no GNN package file or directory specified #4

Closed davidkastner closed 11 months ago

davidkastner commented 11 months ago

Summary

When I run the graph neural-network based scoring functions, I get a series of errors that start with Error: no GNN package file or directory specified. I am attaching the full output in output.txt. Both protein_protein_voromqa_and_gnn_no_sr and protein_protein_voromqa_and_global_and_gnn_no_sr return the same error. Here is an example of my input, which is also included in output.txt:

  >> ls ./*.pdb | /home/kastnd01/workspace/src/ftdmp/ftdmp-qa-all \
  --rank-names protein_protein_voromqa_and_gnn_no_sr \
  --ftdmp-root /home/kastnd01/workspace/src/ftdmp \
  --conda-path /home/kastnd01/workspace/src/miniconda3 \
  --workdir "./works"

Details

FTDMP installed without any issues and methods that don't use graph neural-network based methods run correctly, such as protein_protein_voromqa_no_sr. I run FTDMP with the conda environment activated in a directory with a series of AlphaFold generated PDB's. I setup the conda environment without any issues exactly as specified in the read me and installed all the dependencies. Here are the outputs for testing the conda environment:

PyTorch Test:

>> python -c "import torch; print(torch.__version__)"
>> 2.1.0

PyTorch-Geometric Test:

>> python -c "import torch_geometric; print(torch_geometric.__version__)"
>> 2.3.1

OpenMM Test:

>> python -m openmm.testInstallation
>> All differences are within tolerance.

Computing environment:

Description: Red Hat Enterprise Linux Server release 7.9 (Maipo)
Release: 7.9
Codename: Maipo

Any help would be appreciated and thank you for making this tool!

kliment-olechnovic commented 11 months ago

Hi,

Thanks for the detailed report!

I was not able to reproduce the error, but I think I have a pretty good idea why it happened - there was a bug when checking whether a directory or file exists.

I have made a fix (the commit link is https://github.com/kliment-olechnovic/ftdmp/commit/3202424d413e435ed16fe0078921e5d8fc0a9419), hopefully it will solve your reported issue.

To update to include this fix, you can just run 'git pull', no need to rebuild anything.

davidkastner commented 11 months ago

Thank you for the fast fix! This seems to have resolved the issue in finding the GNN packages. After the protein-GNN mode begins, I received a runtime error that begins with Unknown builtin op: torch_sparse::ptr2ind. I have attached the output in output_2.txt. Could this be caused by a version issue in one of the packages in the conda environment. When I built the environment, I allowed conda to install the latest compatible versions.

kliment-olechnovic commented 11 months ago

Hi,

This is probably a version issue, another user had a similar problem and it was solved by changing CUDA version (see https://github.com/kliment-olechnovic/ftdmp/issues/3).

The PyTorch stable versions change quite rapidly, and newer/older PyTorch versions may not be compatable with older/newer CUDA versions. My suggestion is to install (in conda) the stable version of PyTorch (as described on https://pytorch.org/get-started/locally/) with a compatible CUDA version (the 11.8 is more likely to work, but 12.1 may work too). Then install PyTorch Geometric (as described on https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) matching the previously installed PyTorch and CUDA versions. So, for example:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia conda install pyg -c pyg

davidkastner commented 11 months ago

Thank you! That is what I used as well:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pyg -c pyg

I will try pytorch-cuda=11.7 next! Also, I want to mention that I am currently running the code with CPUs but from what I understand, PyTorch with CUDA also runs on CPUs just not the other way around.

davidkastner commented 11 months ago

I don't believe it is a pytorch-cuda version problem. To help troubleshoot, I set up FTDMP on two separate HPC clusters. One uses Ubuntu and the other Red Hat. On each, I set up separate conda environemnts with either pytorch-cuda version 11.6, 11.7, or 11.8 as the clusters have all CUDA versions as modules. I also tried running FTDMP on both CPU and GPU nodes. In all cases, I got the same error saying:

RuntimeError:
Unknown builtin op: torch_sparse::ptr2ind.

I am attaching the error messages from the Red Hat and Ubtuntu systems, output_redhat.txt and output_ubuntu.txt, respectively. I also tried separately installing the PyTorch Geometric torch-sparse package, conda install pytorch-sparse -c pyg, but I still got the same error.

davidkastner commented 11 months ago

After some trial and error with versioning, I was able to get it running. Although I still get lots of warnings from torch-sparse, it works. The remaining warning unlikely has anything to do with FTDMP and is probably an OS-specific pytorch-sparse incompatibility.

/home/kastnd01/workspace/src/miniconda3/envs/ftdmp-cpu/lib/python3.10/site-packages/torch_geometric/typing.py:42: UserWarning: An issue occurred while importing 'torch-sparse'. Disabling its usage. Stacktrace: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /hpfs/userws/kastnd01/src/miniconda3/envs/ftdmp-cpu/lib/python3.10/site-packages/torch_sparse/_metis_cpu.so)
  warnings.warn(f"An issue occurred while importing 'torch-sparse'.

For anyone who runs into similar errors in the future, here is my exact conda environment specifications that worked for me.

conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 cpuonly -c pytorch
conda install pyg pytorch-cluster pytorch-scatter pytorch-sparse pytorch-spline-conv -c pyg
conda install -c conda-forge pandas
conda install -c conda-forge r-base
conda install -c conda-forge libstdcxx-ng
conda install -c conda-forge openmm
conda install -c conda-forge pdbfixer

Thanks again for your help @kliment-olechnovic! I'll leave the ticket open in case you would like to comment, but feel free to close it.

kliment-olechnovic commented 11 months ago

Thank you @davidkastner !

It seems that what worked for you is forcing the PyTorch version 2.0 with compatible dependencies.

I will try (in a coming week or two) to assess and improve the compatibility of FTDMP with the latest stable PyTorch 2.1.

For now I will leave the issue open, so that people with similar problems could find it more easily.

kliment-olechnovic commented 11 months ago

Hi,

Turns out, PyTorch Geometric installation just using "conda install pyg -c pyg" is no longer feasible, the recommended way is to install PyTorch using conda, and then PyG using pip.

I have updated the documentation with the following instructions:

# install PyTorch using instructions from 'https://pytorch.org/get-started/locally/'
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# install PyTorch Geometric using instructions from 'https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html'
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

# install Pandas
conda install pandas

# if you do not have R installed in you system, install it - not necessarily using conda, e.g 'sudo apt-get install r-base' in Ubuntu
conda install -c conda-forge r-essentials