torchmd / torchmd-net

Training neural network potentials
MIT License
335 stars 75 forks source link

How to install TorchMD-Net with OpenMM #266

Closed peastman closed 10 months ago

peastman commented 10 months ago

I'm stuck trying to create an environment that contains both TorchMD-Net and OpenMM. I can create an environment by following the instructions. But when I install OpenMM into it, conda replaces the CUDA version of PyTorch with the CPU version. Even if I were ok with that (and I could deal with it in this case), I get an exception as soon as I try to import torchmdnet.

  File "/home/peastman/spice/active/simulateTN.py", line 2, in <module>
    from torchmdnet.models.model import load_model
  File "/home/peastman/workspace/torchmd-net/torchmdnet/models/model.py", line 10, in <module>
    from torchmdnet.models import output_modules
  File "/home/peastman/workspace/torchmd-net/torchmdnet/models/output_modules.py", line 9, in <module>
    from torchmdnet.models.utils import act_class_mapping, GatedEquivariantBlock, scatter
  File "/home/peastman/workspace/torchmd-net/torchmdnet/models/utils.py", line 10, in <module>
    from torchmdnet.extensions import get_neighbor_pairs_kernel
  File "/home/peastman/workspace/torchmd-net/torchmdnet/extensions/__init__.py", line 30, in <module>
    _load_library("torchmdnet_extensions")
  File "/home/peastman/workspace/torchmd-net/torchmdnet/extensions/__init__.py", line 23, in _load_library
    torch.ops.load_library(spec.origin)
  File "/home/peastman/miniconda3/envs/torchmd-net/lib/python3.11/site-packages/torch/_ops.py", line 852, in load_library
    ctypes.CDLL(path)
  File "/home/peastman/miniconda3/envs/torchmd-net/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libc10_cuda.so: cannot open shared object file: No such file or directory

Is CUDA supposed to be a hard requirement for TorchMD-Net? I tried to force it to install both at once with

mamba install -c conda-forge openmm=8.1.1 pytorch=*=*cuda*

but that fails with

Encountered problems while solving:
  - nothing provides cudatoolkit 7.5* needed by pytorch-0.2.0-py27cuda7.5cudnn5.1_0

For some reason it wants to downgrade to an old PyTorch. So I tried to force that too with

mamba install -c conda-forge openmm=8.1.1 pytorch=2.1.0=*cuda*

but that fails with

Encountered problems while solving:
  - package nnpops-0.5-cuda112py311h86f5c52_2 requires cudatoolkit >=11.2,<12, but none of the providers can be installed
RaulPPelaez commented 10 months ago

Might this be a mamba thing again? does this behavior occur with conda too?

RaulPPelaez commented 10 months ago

I have tried with conda and the issues you present are only appearing when I try to install openmm 8.1.1:

conda install -c conda-forge openmm==8.1.1
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: \ warning  libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
done

## Package Plan ##

  environment location: /home/raul/miniforge3/envs/testtm

  added / updated specs:
    - openmm==8.1.1

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cuda-cccl-12.3.101         |       ha770c72_0          20 KB  conda-forge
    cuda-cccl_linux-64-12.3.101|       ha770c72_0         1.2 MB  conda-forge
    cuda-crt-dev_linux-64-12.3.107|       ha770c72_0          86 KB  conda-forge
    cuda-crt-tools-12.3.107    |       ha770c72_0          26 KB  conda-forge
    cuda-cudart-12.3.101       |       hd3aeb46_0          22 KB  conda-forge
    cuda-cudart-dev-12.3.101   |       hd3aeb46_0          22 KB  conda-forge
    cuda-cudart-dev_linux-64-12.3.101|       h59595ed_0         342 KB  conda-forge
    cuda-cudart-static-12.3.101|       hd3aeb46_0          22 KB  conda-forge
    cuda-cudart-static_linux-64-12.3.101|       h59595ed_0         728 KB  conda-forge
    cuda-cudart_linux-64-12.3.101|       h59595ed_0         182 KB  conda-forge
    cuda-driver-dev-12.3.101   |       hd3aeb46_0          22 KB  conda-forge
    cuda-driver-dev_linux-64-12.3.101|       h59595ed_0          35 KB  conda-forge
    cuda-libraries-dev-12.3.2  |       ha770c72_0          20 KB  conda-forge
    cuda-nvcc-12.3.107         |       hcdd1206_0          23 KB  conda-forge
    cuda-nvcc-dev_linux-64-12.3.107|       ha770c72_0         8.1 MB  conda-forge
    cuda-nvcc-impl-12.3.107    |       hd3aeb46_0          24 KB  conda-forge
    cuda-nvcc-tools-12.3.107   |       hd3aeb46_0        21.9 MB  conda-forge
    cuda-nvcc_linux-64-12.3.107|       h8a487aa_0          25 KB  conda-forge
    cuda-nvrtc-12.3.107        |       hd3aeb46_0        18.0 MB  conda-forge
    cuda-nvrtc-dev-12.3.107    |       hd3aeb46_0          31 KB  conda-forge
    cuda-nvtx-12.3.101         |       h59595ed_0          31 KB  conda-forge
    cuda-nvvm-dev_linux-64-12.3.107|       ha770c72_0          24 KB  conda-forge
    cuda-nvvm-impl-12.3.107    |       h59595ed_0         8.6 MB  conda-forge
    cuda-nvvm-tools-12.3.107   |       h59595ed_0        11.1 MB  conda-forge
    cuda-opencl-12.3.101       |       h59595ed_0          29 KB  conda-forge
    cuda-opencl-dev-12.3.101   |       h59595ed_0          77 KB  conda-forge
    cuda-profiler-api-12.3.101 |       ha770c72_0          22 KB  conda-forge
    cuda-version-12.3          |       h32bc705_2          21 KB  conda-forge
    libcublas-12.3.4.1         |       hd3aeb46_0       244.8 MB  conda-forge
    libcublas-dev-12.3.4.1     |       hd3aeb46_0          88 KB  conda-forge
    libcufft-11.0.12.1         |       hd3aeb46_0        60.4 MB  conda-forge
    libcufft-dev-11.0.12.1     |       hd3aeb46_0          32 KB  conda-forge
    libcufile-1.8.1.2          |       hd3aeb46_0         898 KB  conda-forge
    libcufile-dev-1.8.1.2      |       hd3aeb46_0          34 KB  conda-forge
    libcurand-10.3.4.107       |       hd3aeb46_0        39.7 MB  conda-forge
    libcurand-dev-10.3.4.107   |       hd3aeb46_0         247 KB  conda-forge
    libcusolver-11.5.4.101     |       hd3aeb46_0        76.6 MB  conda-forge
    libcusolver-dev-11.5.4.101 |       hd3aeb46_0          60 KB  conda-forge
    libcusparse-12.2.0.103     |       hd3aeb46_0       108.2 MB  conda-forge
    libcusparse-dev-12.2.0.103 |       hd3aeb46_0          51 KB  conda-forge
    libnpp-12.2.3.2            |       hd3aeb46_0        96.3 MB  conda-forge
    libnpp-dev-12.2.3.2        |       hd3aeb46_0         443 KB  conda-forge
    libnvjitlink-12.3.101      |       hd3aeb46_0        15.4 MB  conda-forge
    libnvjitlink-dev-12.3.101  |       hd3aeb46_0          25 KB  conda-forge
    libnvjpeg-12.3.0.81        |       h59595ed_0         2.4 MB  conda-forge
    libnvjpeg-dev-12.3.0.81    |       ha770c72_0          31 KB  conda-forge
    openmm-8.1.1               |  py311h11a6390_0        11.2 MB  conda-forge
    ------------------------------------------------------------
                                           Total:       727.5 MB

The following NEW packages will be INSTALLED:

  cuda-crt-dev_linu~ conda-forge/noarch::cuda-crt-dev_linux-64-12.3.107-ha770c72_0 
  cuda-crt-tools     conda-forge/linux-64::cuda-crt-tools-12.3.107-ha770c72_0 
  cuda-nvvm-dev_lin~ conda-forge/noarch::cuda-nvvm-dev_linux-64-12.3.107-ha770c72_0 
  cuda-nvvm-impl     conda-forge/linux-64::cuda-nvvm-impl-12.3.107-h59595ed_0 
  cuda-nvvm-tools    conda-forge/linux-64::cuda-nvvm-tools-12.3.107-h59595ed_0 

The following packages will be UPDATED:

  cuda-cccl                              12.0.90-ha770c72_1 --> 12.3.101-ha770c72_0 
  cuda-cccl_linux-64                     12.0.90-ha770c72_1 --> 12.3.101-ha770c72_0 
  cuda-cudart                           12.0.107-hd3aeb46_8 --> 12.3.101-hd3aeb46_0 
  cuda-cudart-dev                       12.0.107-hd3aeb46_8 --> 12.3.101-hd3aeb46_0 
  cuda-cudart-dev_l~                    12.0.107-h59595ed_8 --> 12.3.101-h59595ed_0 
  cuda-cudart-static                    12.0.107-hd3aeb46_8 --> 12.3.101-hd3aeb46_0 
  cuda-cudart-stati~                    12.0.107-h59595ed_8 --> 12.3.101-h59595ed_0 
  cuda-cudart_linux~                    12.0.107-h59595ed_8 --> 12.3.101-h59595ed_0 
  cuda-driver-dev                       12.0.107-hd3aeb46_8 --> 12.3.101-hd3aeb46_0 
  cuda-driver-dev_l~                    12.0.107-h59595ed_8 --> 12.3.101-h59595ed_0 
  cuda-libraries-dev                      12.0.0-ha770c72_1 --> 12.3.2-ha770c72_0 
  cuda-nvcc                             12.0.76-hba56722_12 --> 12.3.107-hcdd1206_0 
  cuda-nvcc-dev_lin~                     12.0.76-ha770c72_1 --> 12.3.107-ha770c72_0 
  cuda-nvcc-impl                         12.0.76-h59595ed_1 --> 12.3.107-hd3aeb46_0 
  cuda-nvcc-tools                        12.0.76-h59595ed_1 --> 12.3.107-hd3aeb46_0 
  cuda-nvcc_linux-64                    12.0.76-hba56722_12 --> 12.3.107-h8a487aa_0 
  cuda-nvrtc                             12.0.76-hd3aeb46_2 --> 12.3.107-hd3aeb46_0 
  cuda-nvrtc-dev                         12.0.76-hd3aeb46_2 --> 12.3.107-hd3aeb46_0 
  cuda-nvtx                              12.0.76-h59595ed_1 --> 12.3.101-h59595ed_0 
  cuda-opencl                            12.0.76-h59595ed_0 --> 12.3.101-h59595ed_0 
  cuda-opencl-dev                        12.0.76-ha770c72_0 --> 12.3.101-h59595ed_0 
  cuda-profiler-api                      12.0.76-ha770c72_0 --> 12.3.101-ha770c72_0 
  cuda-version                              12.0-hffde075_2 --> 12.3-h32bc705_2 
  libcublas                           12.0.1.189-hd3aeb46_3 --> 12.3.4.1-hd3aeb46_0 
  libcublas-dev                       12.0.1.189-hd3aeb46_3 --> 12.3.4.1-hd3aeb46_0 
  libcufft                             11.0.0.21-hd3aeb46_2 --> 11.0.12.1-hd3aeb46_0 
  libcufft-dev                         11.0.0.21-hd3aeb46_2 --> 11.0.12.1-hd3aeb46_0 
  libcufile                             1.5.0.59-hd3aeb46_1 --> 1.8.1.2-hd3aeb46_0 
  libcufile-dev                         1.5.0.59-hd3aeb46_1 --> 1.8.1.2-hd3aeb46_0 
  libcurand                            10.3.1.50-hd3aeb46_1 --> 10.3.4.107-hd3aeb46_0 
  libcurand-dev                        10.3.1.50-hd3aeb46_1 --> 10.3.4.107-hd3aeb46_0 
  libcusolver                          11.4.2.57-hd3aeb46_2 --> 11.5.4.101-hd3aeb46_0 
  libcusolver-dev                      11.4.2.57-hd3aeb46_2 --> 11.5.4.101-hd3aeb46_0 
  libcusparse                          12.0.0.76-hd3aeb46_2 --> 12.2.0.103-hd3aeb46_0 
  libcusparse-dev                      12.0.0.76-hd3aeb46_2 --> 12.2.0.103-hd3aeb46_0 
  libnpp                               12.0.0.30-hd3aeb46_1 --> 12.2.3.2-hd3aeb46_0 
  libnpp-dev                           12.0.0.30-hd3aeb46_1 --> 12.2.3.2-hd3aeb46_0 
  libnvjitlink                           12.0.76-hd3aeb46_2 --> 12.3.101-hd3aeb46_0 
  libnvjitlink-dev                       12.0.76-hd3aeb46_2 --> 12.3.101-hd3aeb46_0 
  libnvjpeg                            12.0.0.28-h59595ed_1 --> 12.3.0.81-h59595ed_0 
  libnvjpeg-dev                        12.0.0.28-ha770c72_1 --> 12.3.0.81-ha770c72_0 
  openmm                              8.1.0-py311h11a6390_1 --> 8.1.1-py311h11a6390_0 

The following packages will be DOWNGRADED:

  libtorch                       2.1.0-cuda120_h86db2e7_303 --> 2.1.0-cpu_mkl_hadc400e_103 
  nnpops                         0.6-cuda120py311hcbe25e9_6 --> 0.6-cpu_py311h7697b17_6 
  pytorch                   2.1.0-cuda120_py311h9588a60_303 --> 2.1.0-cpu_mkl_py311h249faf5_103 
  torchani                     2.2.4-cuda120py311he2766f7_3 --> 2.2.4-cpu_py311h12a0d1d_3 

However, installing just "openmm" works as intended:

conda install -c conda-forge openmm
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/raul/miniforge3/envs/testtm

  added / updated specs:
    - openmm

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openmm-8.1.0               |  py311h11a6390_1        11.2 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        11.2 MB

The following NEW packages will be INSTALLED:

  ocl-icd-system     conda-forge/linux-64::ocl-icd-system-1.0.0-1 
  openmm             conda-forge/linux-64::openmm-8.1.0-py311h11a6390_1 

Proceed ([y]/n)? y

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

My guess is that nnpops has not yet been built for CUDA 12.3, while openmm 8.1.1 has not been built for CUDA 12.0. This clash makes the solver resort to a cpu version of nnpops and thus torch. You installed torchmd-net from source, so the solver does not know or have the capability to rebuild torchmd-net in CPU mode.

RaulPPelaez commented 10 months ago

Some additional information:

conda install -c conda-forge openmm==8.1.1 cuda-version==12.0.*
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: - warning  libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - package openmm-8.1.1-py311h11a6390_0 requires libcufft >=11.0.8.103,<12.0a0, but none of the providers can be installed

Could not solve for environment specs
The following packages are incompatible
├─ cuda-version 12.0**  is installable and it requires
│  └─ cudatoolkit 12.0|12.0.* , which can be installed;
├─ openmm 8.1.1  is installable with the potential options
│  ├─ openmm 8.1.1 would require
│  │  └─ python >=3.10,<3.11.0a0 , which can be installed;
│  ├─ openmm 8.1.1 would require
│  │  └─ libcufft >=11.0.8.103,<12.0a0  but there are no viable options
│  │     ├─ libcufft 11.0.12.1 would require
│  │     │  └─ cuda-version >=12.3,<12.4.0a0 , which conflicts with any installable versions previously reported;
│  │     └─ libcufft 11.0.8.103 would require
│  │        └─ cuda-version >=12.2,<12.3.0a0 , which conflicts with any installable versions previously reported;
│  ├─ openmm 8.1.1 would require
│  │  └─ cudatoolkit >=11.2,<12 , which conflicts with any installable versions previously reported;
│  ├─ openmm 8.1.1 would require
│  │  └─ cudatoolkit >=11.8,<12 , which conflicts with any installable versions previously reported;
│  ├─ openmm 8.1.1 would require
│  │  └─ python >=3.12,<3.13.0a0 , which can be installed;
│  ├─ openmm 8.1.1 would require
│  │  └─ python >=3.8,<3.9.0a0 , which can be installed;
│  └─ openmm 8.1.1 would require
│     └─ python >=3.9,<3.10.0a0 , which can be installed;
└─ pin-1 is not installable because it requires
   └─ python 3.11.* , which conflicts with any installable versions previously reported.

Pins seem to be involved in the conflict. Currently pinned specs:
 - python 3.11.* (labeled as 'pin-1')

hmmm the wonders of CUDA.

peastman commented 10 months ago

Might this be a mamba thing again? does this behavior occur with conda too?

I can't install with conda at all. It hangs for a very long time, and then crashes. I tried several times and the result was always the same.

I have tried with conda and the issues you present are only appearing when I try to install openmm 8.1.1:

I tried installing 8.0.0 instead:

mamba install -c conda-forge openmm=8.0.0 pytorch=*=*cuda*

The result is the same:

Encountered problems while solving:
  - nothing provides cudatoolkit 7.5* needed by pytorch-0.2.0-py27cuda7.5cudnn5.1_0

Or if I try to force a newer PyTorch with

mamba install -c conda-forge openmm=8.0.0 pytorch=2.1.0=*cuda*

the error is

Encountered problems while solving:
  - package pytorch-2.1.0-cuda112_py312h16c7f42_302 requires python >=3.12,<3.13.0a0, but none of the providers can be installed
RaulPPelaez commented 10 months ago

Thanks, Peter. I have track down the issue to some feedstocks. In particular NNPops https://github.com/conda-forge/nnpops-feedstock/issues/36 and OpenMM https://github.com/conda-forge/openmm-torch-feedstock/issues/48 Whatever comes out of that will probably have repercussions in some downstream dependencies (torchani) and also in the torchmd-net package.

I have seen that even though they are all up to date they cannot be installed alongside CUDA 12.3. With the exception of OpenMM, in which the opposite is true. I have gone through the OpenMM recipe, but I cannot find the reason why that one is provided with CUDA 12.3 while the NNPops one is not.

Note that I can do this:

mamba create -n testop openmm torchmd-net "cuda-version>=12" pytorch=*=*cuda* cuda-nvcc cuda-libraries-dev

It gives me

  + openmm                            8.1.0  py311h11a6390_1            conda-forge     Cached
  + cuda-version                       12.0  hffde075_2                 conda-forge     Cached
  + torchmd-net                      0.14.0  py311hefc3cfb_0            conda-forge       14MB
  + nnpops                              0.6  cuda120py311hcbe25e9_6     conda-forge     Cached
  + pytorch                           2.1.2  cuda120_py311h9588a60_300  conda-forge       29MB

One can then go ahead and pip install torchmd-net, which will (hopefully) replace the conda installed one.

The bad interaction happens with openmm 8.1.1

eva-not commented 10 months ago

Not sure if it helps but I managed to install OpenMM-Torch in the same environment with TorchMD-Net a couple of months ago with:

conda install pytorch=2=*cuda* -c conda-forge
mamba install openmm-torch torchmd-net -c conda-forge

This also installed openmm 8.1.0 and both OpenMM and PyTorch seem to be working fine with CUDA (this was for CUDA 12.0 though).

RaulPPelaez commented 10 months ago

The original issue has been fixed via conda-forge/OpenMM-feedstock#128 .

mamba install -c conda-forge openmm=8.1.1 torchmd-net

  + openmm                            8.1.1  py311h6d2dbb8_1            conda-forge       12MB
  + pytorch                           2.1.2  cuda118_py311hde743b7_301  conda-forge     Cached
  + torchani                          2.2.4  cuda118py311h81fa710_3     conda-forge     Cached
  + nnpops                              0.6  cuda118py311h8042973_7     conda-forge      931kB
  + torchmd-net                      0.14.2  cuda118py311h07fa2a3_0     conda-forge       14MB

And also for CUDA 12:

 mamba create -n testop openmm==8.1.1 torchmd-net "cuda-version>=12" 

  + openmm                            8.1.1  py311h11a6390_1            conda-forge       12MB
  + pytorch                           2.1.2  cuda120_py311h25b6552_301  conda-forge       30MB
  + torchani                          2.2.4  cuda120py311he2766f7_3     conda-forge     Cached
  + nnpops                              0.6  cuda120py311hcbe25e9_7     conda-forge      929kB
  + torchmd-net                      0.14.2  cuda120py311hefc3cfb_0     conda-forge       14MB

Please reopen if you find some other issue with this!