warpem / warp

https://warpem.github.io/warp/
GNU General Public License v3.0
40 stars 7 forks source link

Warp 2.0 cuFFT issues on compute capability>8.8 GPUs (e.g. L4) #178

Closed alisterburt closed 4 months ago

alisterburt commented 4 months ago

@mpm896 found that there seems to be some deeper incompatibility between Warp(Tools) and NVIDIA L4 GPUs

Hi Alister (and all), I've run into a bit of success! I started searching specifically this part of the error on Nvidia forums:

terminate called after throwing an instance of 'std::runtime_error' what(): cuFFT error: CUFFT_INTERNAL_ERROR which it seems like people are getting for a variety of reasons. I saw this post about cuFFT not working on L4 GPUs while it works on T4 GPUs. We have access to instances with A10G GPUs, so I rebooted an instance with this gpu and I'm no longer facing this problem!

I know almost nothing about gpu computing so I have no clue why this would make a difference, but perhaps something needs to be reconfigured for newer GPUs like the L4? I'm also using the same environment that I discussed at the top of this post, with some matlab runtimes (for PEET) in my LD_LIBRARY_PATH, etc. Anyways I hope this helps troubleshoot the issue for others!

alisterburt commented 4 months ago

The L4 is an Ada Lovelace Compute capability 8.9 card. CC8.9 was not supported until 11.8.

We use CUDA 11.7 so this makes sense, will tentatively try to upgrade to 11.8

alisterburt commented 4 months ago

package builds perfectly on 11.8 - testing now edit: also need to investigate upgrade path for existing installs

alisterburt commented 4 months ago

appears to run as expected, tested a few programs and no loud errors