inducer / pyopencl

OpenCL integration for Python, plus shiny features
http://mathema.tician.de/software/pyopencl
Other
1.06k stars 241 forks source link

Array allocation failure #659

Closed pierrepaleo closed 1 year ago

pierrepaleo commented 1 year ago

I have a code which does a series of allocations using pyopencl.array.
One of the allocation fails with the following error

/mnt/multipath-shares/scisoft/tomotools_env/integrator/ubuntu20.04/ppc64le/dev/lib/python3.8/site-packages/silx/opencl/processing.py in allocate_buffers(self, buffers, use_array)
    215                     for buf in buffers:
    216                         print("####", buf.name, buf.size, buf.dtype)
--> 217                         mem[buf.name] = pyopencl.array.empty(self.queue, buf.size, buf.dtype)
    218                 else:
    219                     for buf in buffers:

/mnt/multipath-shares/scisoft/tomotools_env/integrator/ubuntu20.04/ppc64le/dev/lib/python3.8/site-packages/pyopencl/array.py in __init__(***failed resolving arguments***)
    684                     print("alloc_nbytes", alloc_nbytes) # DEBUG
    685 
--> 686                     self.base_data = cl.Buffer(
    687                             context, cl.mem_flags.READ_WRITE, alloc_nbytes)
    688                 else:

TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. pyopencl._cl.Buffer(context: pyopencl._cl.Context, flags: int, size: int = 0, hostbuf: object = None)

Invoked with: <pyopencl.Context at 0x2ec44690 on <pyopencl.Device 'Tesla V100-SXM2-32GB' on 'Portable Computing Language' at 0x2e32ac38>>, 1, 4471.0

~Unfortunately it's difficult to create a MWE to reproduce the crash, not sure why (caching ?).~ See https://github.com/inducer/pyopencl/issues/659#issuecomment-1318177386 for a MWE reproducing the crash

After investigation, it seems that the cause is in pyopencl.array which tries to allocate a buffer where size is passed as float.
Consider the following part in pyopencl/array.py:

            size = 1
            for dim in shape:
                size *= dim
                if dim < 0:
                    raise ValueError(
                        f"negative dimensions are not allowed: {shape}")

Importantly, if size is a uint64 and dim is an integer, the *= operation is promoting size to a float (!).

I'd suggest to use the following:

    size = numpy.prod(shape)
    if size < 0:
         raise ValueError(f"negative dimensions are not allowed: {shape}")

The above code snippet fixes the problem in my case.

Tested on python 3.8, pocl driver, power9 architecture, Nvidia V100 GPU:

inducer commented 1 year ago

cc @alexfikl (since this came from a change you made)

kif commented 1 year ago

To reproduce the bug:

In [1]: import pyopencl, pyopencl.array, numpy
In [2]: ctx = pyopencl.create_some_context()
In [3]: queue = pyopencl.CommandQueue(ctx)
In [4]: ary = pyopencl.array.empty(queue, numpy.uint64(10), dtype="float32")
In [5]: ary = pyopencl.array.empty(queue, (numpy.uint64(10),), dtype="float32")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-1e5620a87523> in <module>
----> 1 ary = pyopencl.array.empty(queue, (numpy.uint64(10),), dtype="float32")

/usr/lib/python3/dist-packages/pyopencl/array.py in __init__(***failed resolving arguments***)
    532                         context = queue.context
    533 
--> 534                     self.base_data = cl.Buffer(
    535                             context, cl.mem_flags.READ_WRITE, alloc_nbytes)
    536                 else:

TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. pyopencl._cl.Buffer(context: pyopencl._cl.Context, flags: int, size: int = 0, hostbuf: object = None)

Invoked with: <pyopencl.Context at 0x1040dd0 on <pyopencl.Device 'gfx900:xnack-' on 'AMD Accelerated Parallel Processing' at 0x1d34a20>>, 1, 40.0