Open btguilherme opened 5 years ago
Hi,
Thanks for the feedback!
the GPU continues to process the images.
I don't fully understand that. You mean the GPU memory is still allocated? Sometimes, a memory allocation error such as this indeed leads to strange behaviour afterwards - how large was the image and what does nvidia-smi
(given you have a nvidia card) shows before and after the nlm3
call?
Hello! Thanks for listening!
I don't fully understand that. You mean the GPU memory is still allocated?
Exactly, the memory is still allocated and the video card continues to process (in the Windows task manager it shows that the images are being computed even with memory error).
how large was the image
I am using an AMD RadeonT R750 4GB, and I am trying to process TIF images 2000x2000 pixels (in fact, the error also happens with RAW files)
In the attached images it is possible to observe that the algorithm breaks, but continues to process the images. The prompt is only available after the end of processing (the result of this test was obtained by processing 156 slices of a RAW file, 980x1008 pixels)
I know the problem is memory allocation (it happens specifically at line 97 of the nlm3.py file, where the accBuf / = weightBuf division happens). The strange thing is that it does not deallocate from memory, nor does it stop processing.
That's weird, given that accBuf /= weightBuf
is an inplace division, and doesn't even allocate extra memory...So the shape of the input image was (156,980,1008)?
Sadly, all our windows machines only have Nvidia cards, so thats gonna be a hard one to debug...
That's weird, given that
accBuf /= weightBuf
is an inplace division, and doesn't even allocate extra memory
In fact, on the machine I am using there is extra memory allocation, there is a peak of video memory consumption shortly after the loops of lines 67-69 of the nlm3 function, which is exactly the accBuf / = weightBuf
call. Maybe it's a "problem" with AMD cards (?), or maybe with pyopencl (the version installed here is 2018.2.2 + cl12).
So the shape of the input image was (156,980,1008)?
Yes.
For now I'm working around the problem dividing the set of images. I compute each subset and store it in another array.
Thank you again for your time! By the way, good work in developing this project.
So if you run somethings like this, do you see peak memory usage to be greater than 2.1 GB (which it does not on my machine)?
import numpy as np
from gputools import OCLArray
x = np.ones((1024,1024,256), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g /= y_g
And thanks for the feedback! :)
I performed the code 10 times in a row and got this result
In fact, there are peaks, as shown in the attached image, in indexes 4, 5, 6 and 9.
So if you change the shape to (1024,1024,400)
it will crash?
Yes
same if you exchange elementwise divide by an elementwise multiply?
import numpy as np
from gputools import OCLArray
x = np.ones((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
x_g *= y_g
With x = np.ones((1024,1024,400), np.float32)
the code don't crash. But if I set x = np.ones((1024,1024,**450**), np.float32)
, then it will crash.
What happens if you run the following code in the same way? Does it still crash?
import numpy as np
from gputools import OCLArray, get_device, OCLElementwiseKernel
x = np.empty((1024,1024,400), np.float32)
x_g = OCLArray.from_array(x)
y_g = OCLArray.from_array(x)
k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")
k(x_g, y_g)
It exhibits the same behavior, running smoothly when x = np.ones ((1024,1024,400), np.float32)
and crashing when x = np.ones ((1024,1024,450), np.float32)
Interesting. So it seems that pyopencl's "/=" operator is not inplace, whereas the "*=" operator is (i.e. does not allocate additional memory). I would guess, that (1024,1024,450)
fails, as it it might slightly be above the available memory of your card - so I would not be too worried about that.
So you could use the inplace-divide kernel from above as workaround
k = OCLElementwiseKernel(
"float *a, float *b",
"a[i] = a[i]/b[i]",
"divide_inplace")
...
# was: accBuf /= weightBuf
k(accBuf,weightBuf)
That should rid you of the original allocation error. Why the kernel keeps on running, however, I still have no idea about ;)
The proposed modifications really worked! I no longer receive the memory error, so the program is being limited only by RAM memory. Many thanks for the support @maweigert !
Why the kernel keeps on running, however, I still have no idea about ;)
In fact, it is very strange behavior.
I am processing an array with shape of (1024x1024x512), the data type is float32. I got the same error.
Hi, I am using the gputools library in my project and I have a problem. When I try to render images with the nlm3 filter and the GPU does not support the amount of images, I get the error of the attached image. However, the GPU continues to process the images. Is there any solution to this problem (kill the process)? Thanks!![untitled](https://user-images.githubusercontent.com/8527432/52002181-78205000-24a8-11e9-84b7-ce8f147a93d6.png)