`"cuda"` target in numba.vectorize not working correctly?

lgray commented 1 month ago

Version of Awkward Array

2.6.6

Description and code to reproduce

The following code fails:

import awkward as ak
import cupy as cp
import numba as nb

ak.numba.register_and_check()

@nb.vectorize(
    [
        nb.float32(nb.float32),
        nb.float64(nb.float64),
    ]
)
def _square(x):
    return x * x

@nb.vectorize(
    [
        nb.float32(nb.float32),
        nb.float64(nb.float64),
    ],
    target="cuda",
)
def _square_cuda(x):
    return x * x

counts = cp.random.poisson(lam=3, size=50)
flat_values = cp.random.normal(size=int(counts.sum()))

values = ak.unflatten(flat_values, counts)

values2_cpu = _square(ak.to_backend(values, "cpu"))

print(values2_cpu)

values2 = _square_cuda(values)

print(values2)

resulting in the output:

[[0.045, 1.83], [1.55, 0.224, 0.621, ..., 1.24, 3.87], ..., [0.153, 0.0017]]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[12], line 33
     29 values2_cpu = _square(ak.to_backend(values, "cpu"))
     31 print(values2_cpu)
---> 33 values2 = _square_cuda(values)
     35 print(values2)

File ~/.conda/envs/coffea-gpu/lib/python3.11/site-packages/numba/cuda/vectorizers.py:28, in CUDAUFuncDispatcher.__call__(self, *args, **kws)
     17 def __call__(self, *args, **kws):
     18     """
     19     *args: numpy arrays or DeviceArrayBase (created by cuda.to_device).
     20            Cannot mix the two types in one call.
   (...)
     26                   the input arguments.
     27     """
---> 28     return CUDAUFuncMechanism.call(self.functions, args, kws)

File ~/.conda/envs/coffea-gpu/lib/python3.11/site-packages/numba/np/ufunc/deviceufunc.py:254, in UFuncMechanism.call(cls, typemap, args, kws)
    252 # Begin call resolution
    253 cr = cls(typemap, args)
--> 254 args = cr.get_arguments()
    255 resty, func = cr.get_function()
    257 outshape = args[0].shape

File ~/.conda/envs/coffea-gpu/lib/python3.11/site-packages/numba/np/ufunc/deviceufunc.py:202, in UFuncMechanism.get_arguments(self)
    198 def get_arguments(self):
    199     """Prepare and return the arguments for the ufunc.
    200     Does not call to_device().
    201     """
--> 202     self._fill_arrays()
    203     self._fill_argtypes()
    204     self._resolve_signature()

File ~/.conda/envs/coffea-gpu/lib/python3.11/site-packages/numba/np/ufunc/deviceufunc.py:100, in UFuncMechanism._fill_arrays(self)
     98     self.scalarpos.append(i)
     99 else:
--> 100     self.arrays[i] = np.asarray(arg)

File [~/coffea-gpu/awkward/src/awkward/highlevel.py:1434](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/highlevel.py#line=1433), in Array.__array__(self, dtype)
   1429 with ak._errors.OperationErrorContext(
   1430     "numpy.asarray", (self,), {"dtype": dtype}
   1431 ):
   1432     from awkward._connect.numpy import convert_to_array
-> 1434     return convert_to_array(self._layout, dtype=dtype)

File [~/coffea-gpu/awkward/src/awkward/_connect/numpy.py:481](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/_connect/numpy.py#line=480), in convert_to_array(layout, dtype)
    480 def convert_to_array(layout, dtype=None):
--> 481     out = ak.operations.to_numpy(layout, allow_missing=False)
    482     if dtype is None:
    483         return out

File [~/coffea-gpu/awkward/src/awkward/_dispatch.py:64](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/_dispatch.py#line=63), in named_high_level_function.<locals>.dispatch(*args, **kwargs)
     62 # Failed to find a custom overload, so resume the original function
     63 try:
---> 64     next(gen_or_result)
     65 except StopIteration as err:
     66     return err.value

File [~/coffea-gpu/awkward/src/awkward/operations/ak_to_numpy.py:48](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/operations/ak_to_numpy.py#line=47), in to_numpy(array, allow_missing)
     45 yield (array,)
     47 # Implementation
---> 48 return _impl(array, allow_missing)

File [~/coffea-gpu/awkward/src/awkward/operations/ak_to_numpy.py:60](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/operations/ak_to_numpy.py#line=59), in _impl(array, allow_missing)
     57 backend = NumpyBackend.instance()
     58 numpy_layout = layout.to_backend(backend)
---> 60 return numpy_layout.to_backend_array(allow_missing=allow_missing)

File [~/coffea-gpu/awkward/src/awkward/contents/content.py:1020](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/contents/content.py#line=1019), in Content.to_backend_array(self, allow_missing, backend)
   1018 else:
   1019     backend = regularize_backend(backend)
-> 1020 return self._to_backend_array(allow_missing, backend)

File [~/coffea-gpu/awkward/src/awkward/contents/listoffsetarray.py:2072](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/contents/listoffsetarray.py#line=2071), in ListOffsetArray._to_backend_array(self, allow_missing, backend)
   2070     return buffer.view(np.dtype(("S", max_count)))
   2071 else:
-> 2072     return self.to_RegularArray()._to_backend_array(allow_missing, backend)

File [~/coffea-gpu/awkward/src/awkward/contents/listoffsetarray.py:283](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/contents/listoffsetarray.py#line=282), in ListOffsetArray.to_RegularArray(self)
    278 _size = Index64.empty(1, self._backend.index_nplike)
    279 assert (
    280     _size.nplike is self._backend.index_nplike
    281     and self._offsets.nplike is self._backend.index_nplike
    282 )
--> 283 self._backend.maybe_kernel_error(
    284     self._backend[
    285         "awkward_ListOffsetArray_toRegularArray",
    286         _size.dtype.type,
    287         self._offsets.dtype.type,
    288     ](
    289         _size.data,
    290         self._offsets.data,
    291         self._offsets.length,
    292     )
    293 )
    294 size = self._backend.index_nplike.index_as_shape_item(_size[0])
    295 length = self._offsets.length - 1

File [~/coffea-gpu/awkward/src/awkward/_backends/backend.py:67](https://analytics-hub.fnal.gov/user/lagray/lab/tree/coffea-gpu/coffea-gpu/awkward/src/awkward/_backends/backend.py#line=66), in Backend.maybe_kernel_error(self, error)
     65     return
     66 else:
---> 67     raise ValueError(self.format_kernel_error(error))

ValueError: cannot convert to RegularArray because subarray lengths are not regular (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-35/awkward-cpp/src/cpu-kernels/awkward_ListOffsetArray_toRegularArray.cpp#L22)

This error occurred while calling

    numpy.asarray(
        <Array [[0.2120250193289826, ...], ...] type='50 * var * float64'>
        dtype = None
    )

lgray commented 1 month ago

ah - I was missing a awkward.numba.register_and_check() call! that was it. Seems to work fine now.

lgray commented 1 month ago

Ah - no that was for a flattened array! I still get the error above even after doing the register_and_check().

lgray commented 1 month ago

So if I wrap in a flatten/unflatten I'm able to get this to work, which is a bit clunky. It seems like some aspect of awkward array are lost when dealing with the "cuda" target of numba.vectorize?

Here's the code that functions:

import awkward as ak
import cupy as cp
import numba as nb

ak.numba.register_and_check()

@nb.vectorize(
    [
        nb.float32(nb.float32),
        nb.float64(nb.float64),
    ]
)
def _square(x):
    return x * x

@nb.vectorize(
    [
        nb.float32(nb.float32),
        nb.float64(nb.float64),
    ],
    target="cuda",
)
def _square_cuda(x):
    return x * x

def square_cuda_wrapped(x):
    counts = x.layout.offsets.data[1:] - x.layout.offsets.data[:-1]
    return ak.unflatten(cp.array(_square_cuda(ak.flatten(x))), counts)

counts = cp.random.poisson(lam=3, size=5000000)
flat_values = cp.random.normal(size=int(counts.sum()))

values = ak.unflatten(flat_values, counts)

values2_cpu = _square(ak.to_backend(values, "cpu"))

print(values2_cpu)

values2 = square_cuda_wrapped(values)

print(values2)

lgray commented 1 month ago

@ianna Have you been able to gain any understanding as to what is going on here? Mostly just curious, really. It's holding up progress with getting coffea going on GPUs.

ianna commented 1 month ago

@lgray and @jpivarski - it looks like there could be a simple solution to this issue. However, it needs to be implemented in Numba. I'm checking with the developers and keep you posted.

jpivarski commented 1 month ago

(I'm not too surprised that the fix needs to go into Numba itself, since Awkward doesn't get much control over how functions that call an Awkward Array get dispatched. If the calling function doesn't call __array_ufunc__ or some equivalent, we have no entry into the code that eventually fails.)

scikit-hep / awkward

`"cuda"` target in numba.vectorize not working correctly? #3179

Version of Awkward Array

Description and code to reproduce