Closed FranklinHu1 closed 10 months ago
Most functionality I have successfully tested with pytorch nightly, i.e most tests run and all that run pass. Depending on what exactly you need though you might run into dependency issues, since releases for some dependencies are not prepared to live outside conda-forge (e.g NNPOps, but you can install it from source and it is compatible with CUDA 12/torch-nightly AFAIK). I have gotten away with running most tests by installing all dependencies that allow so with pip and compiling NNPOps manually.
This environment works for me at the moment of writing:
name: torchmd-net
channels:
- nvidia
- pytorch-nightly
- conda-forge
dependencies:
- python<=3.11
- pip
- cmake
- pytorch-cuda=12.1
- cuda-toolkit=12.1
- cuda-compiler=12.1
- gxx<12
- pytorch
- torchvision
- torchaudio
- ninja
- pip:
- torch-cluster==1.6.1
- torch-geometric==2.3.1
- torch-scatter==2.1.1
- torch-sparse==0.6.17
- pytorch-lightning==1.6.3
- torchmetrics==0.11.4
- tqdm
- pytest
- psutil
- matplotlib
- h5py
- torchani==2.2.3
$ mamba create -f environment.yml
This contains everything required to compile NNPOps too:
$ mamba activate torchmd-net
$ git clone https://github.com/openmm/NNPOps
$ cd NNPOps
$ mkdir build && cd build
$ sed -i 's+14+17+g' ../CMakeLists.txt # Pytorch nightly requires C++17
$ Torch_DIR=$(python -c 'import torch;print(torch.utils.cmake_prefix_path)') cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j5 all install
After this you can go back to torchmd-net/tests and try:
$ pytest test*py
All tests pass on our systems
Torchmd-net and pytorch are now built for CUDA 12 in conda forge and include sm_90. Closing this.
Hello,
I am training the equivariant transformer and running dynamics using it on some Nvidia H100 GPUs. Overall, the workflow is going fine. However, I do get the following warning at the start of every training session:
The CUDA version I am using currently is 12.1, and from the PyTorch website, CUDA 12.1 is only supported with the nightly version of PyTorch, not the stable 2.0.* versions indicated in the
environment.yml
file. Has torchmd-net been tested with this newer version of PyTorch that supports CUDA 12.1 capability? If so, would it be safe to upgrade to the nightly version of PyTorch without breaking the code?Thank you!