inducer / pycuda

CUDA integration for Python, plus shiny features
http://mathema.tician.de/software/pycuda
Other
1.85k stars 288 forks source link

pycuda issue when using arrays bigger than 17 GB #375

Closed pierrepaleo closed 2 years ago

pierrepaleo commented 2 years ago

Describe the bug

It seems that pycuda is not able to use arrays bigger than 17 GB. Allocating (with gpuarray.empty or gpuarray.zeros) works, but any subsequent operation on the array will hang (no crash).

To Reproduce

import pycuda.autoinit
import pycuda.gpuarray as garray
n_frames = 1024
dz = garray.zeros((n_frames, 2048, 2048), "f") # allocation works 
dz += 1 # hangs forever

The same goes with a custom ElementwiseKernel applied on this array: the operation hangs but does not crash.

The limits seems to be 2**34 bytes, meaning that n_frames = 1023 should work in the above example.

Doing the same with a C/Cuda programm works (I can provide a source code if needed).

Tried with the following configurations

Perhaps it has to do with the usage of int instead of unsigned int or size_t, but it looks like pycuda already uses unsigned type at least in get_elwise_module.

inducer commented 2 years ago

I suspect it's this code snippet you're alluding to:

https://github.com/inducer/pycuda/blob/6f60fe4eccde4ec1d7d1a50719222024d1034876/pycuda/elementwise.py#L56-L59

Have you tried changing those types to something bigger, say unsigned long?

Also here:

https://github.com/inducer/pycuda/blob/6f60fe4eccde4ec1d7d1a50719222024d1034876/pycuda/elementwise.py#L106-L109

pierrepaleo commented 2 years ago

Thanks @inducer it seems to solve the problem. Should I do a PR ?

Do you think changing these lines is enough to fix this class of problems, i.e, are there other files I should be looking at ?

inducer commented 2 years ago

Yes, I'd be happy to consider a PR. Thanks for offering!

If you're up for it, please look over reduction.py and scan.py for related issues.