Open RaulPPelaez opened 1 year ago
This is ready to merge.
CUDA 11.8 build tends to fail due to some form of disk access error when installing CUDA. Must be a bug in the Jimver thingy. There is a new version, lets try with that...
I have purged the GA cache. If it fails, try to rerun.
I am not sure if I do not have rights to do so or just do not know how, but I cannot rerun the CI. I will just make a spurious commit.
11.8 Still refuses to download it seems.
[Linux (CUDA 11.8, Python 3.10, PyTorch 2.0)](https://github.com/openmm/NNPOps/actions/runs/5892449251/job/15981745203#step:1:39)
You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 0 MB
Do you know if this disk limit is per action or per individual check? If it is the former maybe we can do something, for the latter I do not really know why cuda 11.2 takes more space than 11.8 as to go over the threshold.
This is ready for review. With the changes in conda-forge regarding CUDA, from version 12 there is no need to install cuda at the OS level in the CI (so no Jimver/cuda github action). This is good news here because the current CI is constantly running out of space. However, the workflow is different enough that I decided to move it to a different CI. The idea being that eventually the old one will be dropped (when CUDA 12 is the oldest version supported I guess).
I had to deal with a couple of quicks in the compilation process for pytorch 2.1 and CUDA 12. In particular:
I am using the changes to CMakeLists.txt as a patch to build this https://github.com/conda-forge/nnpops-feedstock/pull/29
@mikemhenry I would like to merge this, but I believe the self hosted runner is not working.
Compiling with CUDA 12 and a very recent pytorch version (such as v2.1.0 from the nightly) will make compilation fail because C++17 is required to compile pytorch:
Simply setting the standard from 14 to 17 in CMakeLists.txt fixes it. CUDA 11 also supports C++17, but CUDA 10.2 does not. I check for this and leave it at C++14 in that case. GCC supports C++17 since version 7, so I default it to it.