inducer / pyopencl

OpenCL integration for Python, plus shiny features
http://mathema.tician.de/software/pyopencl
Other
1.06k stars 241 forks source link

Python crashes when using property to get a cl.Buffer #487

Closed yves-surrel closed 3 years ago

yves-surrel commented 3 years ago

This simple program (where attribute a is defined in __init__()) works:

import pyopencl as cl
import pyopencl.array as cla

ctx = cl.create_some_context()

queue = cl.CommandQueue(ctx)

h, w = 500, 100

class Test():

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.a = cla.empty(queue, (h, w), 'float32')

    def test(self):
        self.prg = prg = cl.Program(ctx, """
            __kernel void test(__global float *a)
            {int id = get_global_id(0); }
            """).build()

        knl = prg.test

        knl.set_args(self.a.data)

        cl.enqueue_nd_range_kernel(queue, knl, (h * w,), None)

Test().test()

This one, where a is got from a property, crashes when use_empty is set to True, and works when it is set to False. AFAIU the difference between cla.empty and cla.zeros, it should work the same (the previous program works, so...):

import pyopencl as cl
import pyopencl.array as cla

ctx = cl.create_some_context()

queue = cl.CommandQueue(ctx)

h, w = 500, 100

use_empty = True

class Test():

    @property
    def a(self):
        if use_empty:
            return cla.empty(queue, (h, w), 'float32')
        return cla.zeros(queue, (h, w), 'float32')

    def test(self):
        self.prg = prg = cl.Program(ctx, """
            __kernel void test(__global float *a)
            {int id = get_global_id(0); }
            """).build()

        knl = prg.test

        knl.set_args(self.a.data)

        cl.enqueue_nd_range_kernel(queue, knl, (h * w,), None)

Test().test()

And finally, I could not make this last one to work (the same when using cla.to_device within the property):

import numpy as np
import pyopencl as cl
import pyopencl.array as cla

ctx = cl.create_some_context()

queue = cl.CommandQueue(ctx)

h, w = 500, 100

class Test():

    @property
    def a(self):
        a = cla.zeros(queue, (h, w), np.float32)
        npa = np.ones((h, w), np.float32)

        # Be sure to keep a reference to the returned NannyEvent
        self.ev = cl.enqueue_copy(queue, a.data, npa, is_blocking=True)

        return a

    def test(self):
        self.prg = prg = cl.Program(ctx, """
            __kernel void test(__global float *a)
            {int id = get_global_id(0); }
            """).build()

        knl = prg.test

        knl.set_args(self.a.data)

        cl.enqueue_nd_range_kernel(queue, knl, (h * w,), None)

Test().test()

(Tested on MacBook Pro under MacOSX 11.4, and Ubuntu 20.04.2 via VirtualBox)

inducer commented 3 years ago

The main issue in your code is the lifetime of the a returned by the property. Since you write

knl.set_args(self.a.data)

what the Python runtime does is:

As a result, by the time you get to the kernel enqueue, the buffer is (at least allowed to be) long gone. And even if the object lived to the end of

knl.set_args(self.a.data)

being set as an argument on a kernel will not keep a buffer alive. (However, being an argument to a submitted kernel will. This is perhaps a bit unintuitive, but it is covered by the CL spec. I had this question, too, not too long ago, see here:https://github.com/KhronosGroup/OpenCL-Docs/issues/619).

To avoid issues of this nature, the safe thing to do is to make sure that all your buffers survive at least to the point of the kernel enqueue. In your case, changing your code to

a = self.a
knl.set_args(a.data)

(The local variable a keeps the buffer alive.)

The difference between zeros and empty is spurious IMO. Because of the events returned, there is a chance that there is a reference cycle in the object graph, which means deallocation is not immediate, but upon the next GC run, which may happen at an indeterminate time.

yves-surrel commented 3 years ago

I suspected something like that (that's why I set the nanny event to be referred to in self, which is finally unnecessary), but could not figure out exactly what was happening. Also, I think the way I create the property is not the best one. Usually, a property refers to a hidden attribute in self. If I had done so, the problem would not have happened. Thanks for the explanation and fix

yves-surrel commented 3 years ago

We had the opportunity go come back to this bug ;-)

About your comment line: Decrement the refcount on the result of self.a, which in many of the cases you show leads to its deallocation.

shouldn't using .data increase the refcount of the clArray object so that this problem does not happen? At first sight, it would be logical...? After all, we are referencing the array through its attribute data... Maybe I am silly on that point, but its really tweaky.

inducer commented 3 years ago

The point isn't that the deallocation of the array is an issue (it's fine: all reference to it are gone, nobody will ever look at it again). But the array being gone entails that it (the array) won't keep the buffer alive.

Holding a reference to the buffer will keep the buffer alive, however also only for as long as the Python object lives. The Python Buffer object dies after the call to set_args completes.

The big question then is: who is keeping the buffer alive at this point. IIUC, the answer is nobody. As I explained with the spec reference, being set as a kernel argument does not keep a buffer alive. PyOpenCL could go against that, but it could lead to very unintuitive memory consumption behaviors, with buffers being kept alive long after they're expected to. I'm much more comfortable with the precedent set by the spec.

So, in short, you do need to keep some Python object around long enough to hold a reference to the buffer, otherwise it has every right to be gone.