maweigert / gputools

GPU accelerated image/volume processing in Python
BSD 3-Clause "New" or "Revised" License
108 stars 20 forks source link

pyopencl._cl.MemoryError: clEnqueueNDRangeKernel failed: MEM_OBJECT_ALLOCATION_FAILURE #14

Open btguilherme opened 5 years ago

btguilherme commented 5 years ago

Hi, I am using the gputools library in my project and I have a problem. When I try to render images with the nlm3 filter and the GPU does not support the amount of images, I get the error of the attached image. However, the GPU continues to process the images. Is there any solution to this problem (kill the process)? Thanks! untitled

maweigert commented 5 years ago

Hi,

Thanks for the feedback!

the GPU continues to process the images.

I don't fully understand that. You mean the GPU memory is still allocated? Sometimes, a memory allocation error such as this indeed leads to strange behaviour afterwards - how large was the image and what does nvidia-smi (given you have a nvidia card) shows before and after the nlm3 call?

btguilherme commented 5 years ago

Hello! Thanks for listening!

I don't fully understand that. You mean the GPU memory is still allocated?

Exactly, the memory is still allocated and the video card continues to process (in the Windows task manager it shows that the images are being computed even with memory error).

initial

how large was the image

I am using an AMD RadeonT R750 4GB, and I am trying to process TIF images 2000x2000 pixels (in fact, the error also happens with RAW files)

In the attached images it is possible to observe that the algorithm breaks, but continues to process the images. The prompt is only available after the end of processing (the result of this test was obtained by processing 156 slices of a RAW file, 980x1008 pixels)

final

I know the problem is memory allocation (it happens specifically at line 97 of the nlm3.py file, where the accBuf / = weightBuf division happens). The strange thing is that it does not deallocate from memory, nor does it stop processing.

maweigert commented 5 years ago

That's weird, given that accBuf /= weightBuf is an inplace division, and doesn't even allocate extra memory...So the shape of the input image was (156,980,1008)? Sadly, all our windows machines only have Nvidia cards, so thats gonna be a hard one to debug...

btguilherme commented 5 years ago

That's weird, given that accBuf /= weightBuf is an inplace division, and doesn't even allocate extra memory

In fact, on the machine I am using there is extra memory allocation, there is a peak of video memory consumption shortly after the loops of lines 67-69 of the nlm3 function, which is exactly the accBuf / = weightBuf call. Maybe it's a "problem" with AMD cards (?), or maybe with pyopencl (the version installed here is 2018.2.2 + cl12).

So the shape of the input image was (156,980,1008)?

Yes.

For now I'm working around the problem dividing the set of images. I compute each subset and store it in another array.

Thank you again for your time! By the way, good work in developing this project.

maweigert commented 5 years ago

So if you run somethings like this, do you see peak memory usage to be greater than 2.1 GB (which it does not on my machine)?

import numpy as np 
from gputools import OCLArray

x = np.ones((1024,1024,256), np.float32)

x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g /= y_g

And thanks for the feedback! :)

btguilherme commented 5 years ago

I performed the code 10 times in a row and got this result

presentation1

In fact, there are peaks, as shown in the attached image, in indexes 4, 5, 6 and 9.

maweigert commented 5 years ago

So if you change the shape to (1024,1024,400) it will crash?

btguilherme commented 5 years ago

Yes error

maweigert commented 5 years ago

same if you exchange elementwise divide by an elementwise multiply?

import numpy as np 
from gputools import OCLArray
x = np.ones((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g *= y_g
btguilherme commented 5 years ago

With x = np.ones((1024,1024,400), np.float32) the code don't crash. But if I set x = np.ones((1024,1024,**450**), np.float32), then it will crash.

maweigert commented 5 years ago

What happens if you run the following code in the same way? Does it still crash?

import numpy as np 
from gputools import OCLArray, get_device, OCLElementwiseKernel

x = np.empty((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)

k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")

k(x_g, y_g)
btguilherme commented 5 years ago

It exhibits the same behavior, running smoothly when x = np.ones ((1024,1024,400), np.float32) and crashing when x = np.ones ((1024,1024,450), np.float32)

maweigert commented 5 years ago

Interesting. So it seems that pyopencl's "/=" operator is not inplace, whereas the "*=" operator is (i.e. does not allocate additional memory). I would guess, that (1024,1024,450) fails, as it it might slightly be above the available memory of your card - so I would not be too worried about that.

So you could use the inplace-divide kernel from above as workaround

k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")

...

# was: accBuf /= weightBuf
k(accBuf,weightBuf)

That should rid you of the original allocation error. Why the kernel keeps on running, however, I still have no idea about ;)

btguilherme commented 5 years ago

The proposed modifications really worked! I no longer receive the memory error, so the program is being limited only by RAM memory. Many thanks for the support @maweigert !

Why the kernel keeps on running, however, I still have no idea about ;)

In fact, it is very strange behavior.

jingpengw commented 1 year ago

I am processing an array with shape of (1024x1024x512), the data type is float32. I got the same error.