spinsphotonics / fdtdz

Fast, scalable, accessible photonic simulation
MIT License
114 stars 13 forks source link

CUDA_ERROR_UNSUPPORTED_PTX_VERSION and re-compiling for different CUDA version #14

Closed johnrollinson closed 1 year ago

johnrollinson commented 1 year ago

First, thank you very much for creating this package, I have been looking for something like this for a long time and am very excited to experiment with fdtdz.

I just wanted to post this here in case anyone else runs into the same issue.

I'm using fdtdz on an HPC cluster with Tesla V100's with CUDA driver version 470.57.02 and CUDA runtime version 11.4.

I initially installed fdtdz from pip with no installation issues, but when I ran the demo notebook I got the following error during the simulation step:

CUDA_ERROR_UNSUPPORTED_PTX_VERSION (the provided PTX was compiled with an unsupported toolchain.) in /tmp/pip-install-jpn5veal/fdtdz_eba4edc6d1564f1bb2469f3b1ec6c305/cuda/kernel_precompiled.h:119

I did some looking around online and it seems like this error is usually related to the CUDA driver version (e.g. here, using an older driver with software compiled for a newer CUDA version). Since I'm on an HPC cluster, updating the CUDA driver is not really an option for me (at least not an easy one ...). I dug around some more and saw that fdtdz uses some pre-compiled PTX kernels so I figured I'd try recompiling with my CUDA toolchain version.

Here are the steps I used:

  1. Download the source code: git clone https://github.com/spinsphotonics/fdtdz.git
  2. Remove the pre-compiled ptx files: cd fdtdz && rm -r src/fdtdz_jax/ptx
  3. Re-compile the ptx files from source: cd cuda && mkdir build && cd build && cmake .. && make -j && ctest --verbose

    Some of the tests here were failing for me (some complaining about CUDA_ERROR_INVALID_PTX and one failing because it could not find compute-sanitizer even though it is installed on my system and in my PATH). I didn't look into the errors too much but it seemed like some of the tests were looking for files in /usr/local/cuda-11.8 so I figured maybe it was just due to a hardcoded path somewhere and figured I'd try to use the compiled ptx files anyway.

  4. Copy the compiled ptx files back into src/fdtdz_jax/ptx cd .. && cp -r ptx ../src/fdtdz_jax/ptx
  5. Install from source using pip cd .. && pip install -e .

After this I was able to run the demo notebook without any issues. Total wall time was 47.3s running the simulation on a Tesla V100, so it seems like performance has not been affected by using 11.4 :+1:

jlu-spins commented 1 year ago

(Sorry, closed by accident!)

Thanks so much for posting this and great job digging around @johnrollinson!

The hardcoded path is indeed at https://github.com/spinsphotonics/fdtdz/blob/d7a174ea179039f83839d432cd4758adf41cea5e/cuda/CMakeLists.txt#L137C30-L137C38 and really should be fixed. I'll keep this open until that happens.

Also, I'll look to add a link to your super-helpful post on the README.md.

Thanks again!!