openmm / NNPOps

High-performance operations for neural network potentials
Other
81 stars 17 forks source link

Set c++17 standard in CMake for recent torch/cuda versions #109

Open RaulPPelaez opened 1 year ago

RaulPPelaez commented 1 year ago

Compiling with CUDA 12 and a very recent pytorch version (such as v2.1.0 from the nightly) will make compilation fail because C++17 is required to compile pytorch:

(test7) $ Torch_DIR=$(python -c 'import torch;print(torch.utils.cmake_prefix_path)')  cmake -DCMAKE_BUILD_TYPE=Release ..                                 
make -j15                                                                                                                                                                    
-- The CXX compiler identification is GNU 12.3.0                                                                                                                             
-- Detecting CXX compiler ABI info                                                                                                                                           
-- Detecting CXX compiler ABI info - done                                                                                                                                    
-- Check for working CXX compiler: /shared/raul/mambaforge/envs/test7/bin/x86_64-conda-linux-gnu-c++ - skipped                                                               
-- Detecting CXX compile features                                                                                                                                            
-- Detecting CXX compile features - done                                                                                                                                     
-- The CUDA compiler identification is NVIDIA 12.1.105                                                                                                                       
-- Detecting CUDA compiler ABI info                                                                                                                                          
-- Detecting CUDA compiler ABI info - done                                                                                                                                   
-- Check for working CUDA compiler: /shared/raul/mambaforge/envs/test7/bin/nvcc - skipped                                                                                    
-- Detecting CUDA compile features                                                                                                                                           
-- Detecting CUDA compile features - done                                                                                                                                    
-- Found Python3: /shared/raul/mambaforge/envs/test7/bin/python3.11 (found version "3.11.0") found components: Interpreter Development Development.Module Development.Embed  
-- Found CUDA: /shared/raul/mambaforge/envs/test7 (found version "12.1")                                                                                                     
-- Found CUDAToolkit: /shared/raul/mambaforge/envs/test7/include (found version "12.1.105") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD                                            
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed                                   
-- Looking for pthread_create in pthreads                                             
-- Looking for pthread_create in pthreads - not found                                 
-- Looking for pthread_create in pthread                                              
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Caffe2: CUDA detected: 12.1
-- Caffe2: CUDA nvcc is: /shared/raul/mambaforge/envs/test7/bin/nvcc
-- Caffe2: CUDA toolkit directory: /shared/raul/mambaforge/envs/test7
-- Caffe2: Header version is: 12.1
-- /shared/raul/mambaforge/envs/test7/lib/libnvrtc.so shorthash is 8144a3bc      
-- USE_CUDNN is set to 0. Compiling without cuDNN support                          
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support                                                                                                          -- Autodetected CUDA architecture(s):  8.9 8.9 8.9 8.9                                                                                                                       
-- Added CUDA NVCC flags for: -gencode;arch=compute_89,code=sm_89                                                                                                            
-- MKL_ARCH: intel64                                                                                                                                                         
-- MKL_ROOT /shared/raul/mambaforge/envs/test7                                                                                                                               
-- MKL_LINK: dynamic                                                                                                                                                         
-- MKL_INTERFACE_FULL: intel_ilp64                                                                                                                                           
-- MKL_THREADING: intel_thread                                                                                                                                               
-- MKL_MPI: intelmpi                                                                                                                                                         
CMake Warning at /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):                 
  static library kineto_LIBRARY-NOTFOUND not found.                                                                                                                          
Call Stack (most recent call first):                                                                                                                                         
  /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)                                     CMakeLists.txt:13 (find_package)                                                                                                                                           

-- Configuring done (1.6s)                                                                                                                                                   
-- Generating done (0.1s)                                                                                                                                                    
-- Build files have been written to: /shared/raul/NNPOps/build                                                                                                               
(test7) $ make -j15                                                                                                                             [140/1551]
[ 21%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/BatchedNN.cpp.o                                                                                          
[ 21%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/ani/CpuANISymmetryFunctions.cpp.o                            
[ 26%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/CFConv.cpp.o                                                                                             
[ 26%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/ani/CudaANISymmetryFunctions.cu.o                                            
[ 34%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/SymmetryFunctions.cpp.o                                          
[ 34%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/CFConvNeighbors.cpp.o                                                       
[ 43%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/getNeighborPairsCPU.cpp.o                                                                      [ 43%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/getNeighborPairsCUDA.cu.o                                                                     
[ 52%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/neighbors/neighbors.cpp.o                                                                                
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pmeCPU.cpp.o                                                                                         
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pme.cpp.o                                                                                            
[ 60%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/schnet/CudaCFConv.cu.o                                                                                          
[ 60%] Building CUDA object CMakeFiles/NNPOpsPyTorch.dir/src/pytorch/pme/pmeCUDA.cu.o                                                                                        
[ 60%] Building CXX object CMakeFiles/NNPOpsPyTorch.dir/src/schnet/CpuCFConv.cpp.o                                                                                           
In file included from /shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/include/torch/extension.h:4,                                                     
                 from /shared/raul/NNPOps/src/pytorch/pme/pmeCUDA.cu:1:                                                                                                      
/shared/raul/mambaforge/envs/test7/lib/python3.11/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is req
uired to use PyTorch.                                                                                                                                                        
    4 | #error C++17 or later compatible compiler is required to use PyTorch.                                                                                                      |  ^~~~~                                                                                                                                                               
[ 60%] Built target copy_test   

Simply setting the standard from 14 to 17 in CMakeLists.txt fixes it. CUDA 11 also supports C++17, but CUDA 10.2 does not. I check for this and leave it at C++14 in that case. GCC supports C++17 since version 7, so I default it to it.

RaulPPelaez commented 1 year ago

This is ready to merge.

RaulPPelaez commented 1 year ago

CUDA 11.8 build tends to fail due to some form of disk access error when installing CUDA. Must be a bug in the Jimver thingy. There is a new version, lets try with that...

raimis commented 1 year ago

I have purged the GA cache. If it fails, try to rerun.

RaulPPelaez commented 1 year ago

I am not sure if I do not have rights to do so or just do not know how, but I cannot rerun the CI. I will just make a spurious commit.

RaulPPelaez commented 1 year ago

11.8 Still refuses to download it seems.

raimis commented 1 year ago
[Linux (CUDA 11.8, Python 3.10, PyTorch 2.0)](https://github.com/openmm/NNPOps/actions/runs/5892449251/job/15981745203#step:1:39)
You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 0 MB
RaulPPelaez commented 1 year ago

Do you know if this disk limit is per action or per individual check? If it is the former maybe we can do something, for the latter I do not really know why cuda 11.2 takes more space than 11.8 as to go over the threshold.

RaulPPelaez commented 10 months ago

This is ready for review. With the changes in conda-forge regarding CUDA, from version 12 there is no need to install cuda at the OS level in the CI (so no Jimver/cuda github action). This is good news here because the current CI is constantly running out of space. However, the workflow is different enough that I decided to move it to a different CI. The idea being that eventually the old one will be dropped (when CUDA 12 is the oldest version supported I guess).

I had to deal with a couple of quicks in the compilation process for pytorch 2.1 and CUDA 12. In particular:

RaulPPelaez commented 10 months ago

I am using the changes to CMakeLists.txt as a patch to build this https://github.com/conda-forge/nnpops-feedstock/pull/29

RaulPPelaez commented 7 months ago

@mikemhenry I would like to merge this, but I believe the self hosted runner is not working.