[BUG] OutOfMemoryError in spectrogram

Bruyant commented 2 years ago

I get a memory error if the number of samples > 110M in spectrogramm

Steps/Code to reproduce bug

import cupy as cp
import cusignal

S=cp.ones(20001000) #20000000 is OK
cusignal.spectrogram(S,nperseg=1024,noverlap=512,nfft=4096)

S=cp.ones(64000000) #63000000 is OK
cusignal.spectrogram(S,nperseg=1024,noverlap=512)

S=cp.ones(110000000) #110000000 is OK
cusignal.spectrogram(S,nperseg=1024)

I get an error dependending on the signal length

OutOfMemoryError: Out of memory allocating 1,280,016,384 bytes (allocated so far: 5,068,731,392 bytes).

Expected behavior I would expect the spectrogramm function or _fft_helper to cut the datas in chunk if it do not fit the Memory of the gpu or a keword for chunking.

Environment details:

rapids-21.10 installed by conda on Win10
Driver Version: 510.06 CUDA Version: 11.6
Geforce with 6144MiB

Full traceback :

``` File "C:\UsersPrograms\Anaconda3\envs\rapids-21.10\lib\site-packages\cusignal\spectral_analysis\spectral.py", line 824, in spectrogram freqs, time, Sxx = _spectral_helper( File "C:\UsersPrograms\Anaconda3\envs\rapids-21.10\lib\site-packages\cusignal\spectral_analysis\spectral.py", line 1844, in _spectral_helper result = _fft_helper(x, win, detrend_func, nperseg, noverlap, nfft, sides) File "C:\UsersPrograms\Anaconda3\envs\rapids-21.10\lib\site-packages\cusignal\spectral_analysis\spectral.py", line 1930, in _fft_helper result = func(result, n=nfft) File "C:\UsersPrograms\Anaconda3\envs\rapids-21.10\lib\site-packages\cupy\fft\_fft.py", line 823, in rfft return _fft(a, (n,), (axis,), norm, cufft.CUFFT_FORWARD, 'R2C') File "C:\UsersPrograms\Anaconda3\envs\rapids-21.10\lib\site-packages\cupy\fft\_fft.py", line 242, in _fft a = _cook_shape(a, s, axes, value_type) File "C:\UsersPrograms\Anaconda3\envs\rapids-21.10\lib\site-packages\cupy\fft\_fft.py", line 58, in _cook_shape z = cupy.zeros(shape, a.dtype.char, order=order) File "C:\UsersPrograms\Anaconda3\envs\rapids-21.10\lib\site-packages\cupy\_creation\basic.py", line 209, in zeros a = cupy.ndarray(shape, dtype, order=order) File "cupy\_core\core.pyx", line 167, in cupy._core.core.ndarray.__init__ File "cupy\cuda\memory.pyx", line 718, in cupy.cuda.memory.alloc File "cupy\cuda\memory.pyx", line 1395, in cupy.cuda.memory.MemoryPool.malloc File "cupy\cuda\memory.pyx", line 1416, in cupy.cuda.memory.MemoryPool.malloc File "cupy\cuda\memory.pyx", line 1096, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc File "cupy\cuda\memory.pyx", line 1117, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc File "cupy\cuda\memory.pyx", line 1355, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc OutOfMemoryError: Out of memory allocating 1,280,016,384 bytes (allocated so far: 5,068,731,392 bytes). ```

awthomp commented 2 years ago

Hey @Bruyant. Thanks for using cuSignal!

It looks like you're allocating more data on the GPU than you have physical space for. You mentioned that smaller sample sizes work as expected.

I would expect the spectrogramm function or _fft_helper to cut the datas in chunk if it do not fit the Memory of the gpu or a keword for chunking.

Unfortunately, this is not how cuSignal works, particularly if you're allocating memory directly on GPU with cupy.ones or cupy.random.randn or whatever. CUDA Managed Memory allows for data movement via page migration, where you can allocate as much managed memory as system memory, but we don't natively support that here, yet.

Bruyant commented 2 years ago

Is there a better place to put my input datas to save memory ?

awthomp commented 2 years ago

What are you trying to do? A spectrogram on streaming data coming into the GPU?

Bruyant commented 2 years ago

Thanks for your answer, I just whant to compute a spectrogram of a long acquisition, comming from a disk file. I'm naively thinking about 2 solutions:

Monkey patch _fft_helper to feed result = func(result, n=nfft) with chunks
Launch repeatedly spectrogram with care of overlapping from a cpu side numpy array.

mnicely commented 2 years ago

I much easier workaround would be to allocate with CuPy's Managed Memory allocator (https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ManagedMemory.html#cupy.cuda.ManagedMemory & https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.malloc_managed.html) This will allow the driver to migrate data back-and-forth between System and Device memory using page faults. This won't be performant, but will be functional.

To better understand what's going on, please read https://developer.nvidia.com/blog/unified-memory-cuda-beginners/

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapidsai / cusignal

[BUG] OutOfMemoryError in spectrogram #434