scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
844 stars 89 forks source link

Slicing ak.Array in cuda backend breaks #3133

Open essoca opened 5 months ago

essoca commented 5 months ago

Version of Awkward Array

2.6.4

Description and code to reproduce

Hey guys,

While playing with simple slices of ak.Arrays, I noticed the following:

a = ak.Array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], backend='cpu')

>>> print(a[:, :, 0])
[[0, 2], [4, 6]]

>>> print(a[:, ::-1])
[[[2, 3], [0, 1]], [[6, 7], [4, 5]]]

So, in the cpu backend, this works as expected. Now cuda's turn

a = ak.Array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], backend='cuda')

>>> print(a)
[[[0, 1], [2, 3]], [[4, 5], [6, 7]]]

>>> print(a[:, :, 0])

Traceback (most recent call last):
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/highlevel.py", line 1065, in __getitem__
    prepare_layout(self._layout[where]),
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/contents/content.py", line 512, in __getitem__
    return self._getitem(where)
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/contents/content.py", line 557, in _getitem
    out = next._getitem_next(nextwhere[0], nextwhere[1:], None)
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/contents/regulararray.py", line 498, in _getitem_next
    self._backend[
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/_backends/cupy.py", line 38, in __getitem__
    _cuda_kernels = cuda.initialize_cuda_kernels(cupy)
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/_connect/cuda/__init__.py", line 187, in initialize_cuda_kernels
    import awkward._connect.cuda._kernel_signatures
ModuleNotFoundError: No module named 'awkward._connect.cuda._kernel_signatures'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/highlevel.py", line 1063, in __getitem__
    with ak._errors.SlicingErrorContext(self, where):
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/_errors.py", line 85, in __exit__
    self.handle_exception(exception_type, exception_value)
  File ".../miniforge3/envs/qb/lib/python3.10/site-packages/awkward/_errors.py", line 95, in handle_exception
    raise self.decorate_exception(cls, exception)
ModuleNotFoundError: No module named 'awkward._connect.cuda._kernel_signatures'

This error occurred while attempting to slice

    <Array [[[0, 1], [2, 3]], [[...], ...]] type='2 * var * var * int64'>

with

    (:, :, 0)

Am I missing something here?

agoose77 commented 5 months ago

That's an interesting error. How did you install awkward? Did you pip install from the Git repo perchance?

jpivarski commented 5 months ago

This is expected to break because it would use a not-yet-existing kernel, awkward_ListArray_getitem_next_range_carrylength, but your error message says that awkward._connect.cuda._kernel_signatures is missing, which implies that it was installed incorrectly (as @agoose77 said).

If you've ever installed Awkward with the developer instructions, remove it

pip uninstall awkward awkward-cpp

get a new clone of Awkward's git repo, and reinstall, starting from the nox step. As @ManasviGoyal has been adding (and changing) kernels, you need all of the generated header files (including awkward._connect.cuda._kernel_signatures) to be completely up-to-date and consistent.

After that (I just did it), you should see

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward/src/awkward/highlevel.py", line 1065, in __getitem__
    prepare_layout(self._layout[where]),
                   ~~~~~~~~~~~~^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/content.py", line 512, in __getitem__
    return self._getitem(where)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/content.py", line 557, in _getitem
    out = next._getitem_next(nextwhere[0], nextwhere[1:], None)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/regulararray.py", line 518, in _getitem_next
    nextcontent._getitem_next(nexthead, nexttail, advanced),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/listarray.py", line 757, in _getitem_next
    self._backend[
  File "/home/jpivarski/irishep/awkward/src/awkward/_backends/cupy.py", line 43, in __getitem__
    raise AssertionError(f"CuPyKernel not found: {index!r}")
AssertionError: CuPyKernel not found: ('awkward_ListArray_getitem_next_range_carrylength', <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.int64'>)

because the awkward_ListArray_getitem_next_range_carrylength needs to be ported to CUDA. It will be, soon.

ManasviGoyal commented 5 months ago

awkward_ListArray_getitem_next_range_carrylength has already been implemented in #3130. It will be merged in a few days.

essoca commented 5 months ago

That's an interesting error. How did you install awkward? Did you pip install from the Git repo perchance?

Yes @agoose77, I pip-installed it a couple of weeks ago from the commit a096f3d, since I wanted to test the fix #3115.

@jpivarski: when is 2.6.5 released? Will the PR from @ManasviGoyal fixing #3130 be included in that release?

jpivarski commented 5 months ago

awkward_ListArray_getitem_next_range_carrylength has already been implemented in #3130. It will be merged in a few days.

So in principle, doing

git checkout ManasviGoyal/improve-variable-length-loop-kernels

before

nox -s prepare
python -m pip install -v ./awkward-cpp
python -m pip install -e .

(in a freshly cloned directory, so that all of the headers made with nox -s prepare are new) should be able to slice in CUDA.

@jpivarski: when is 2.6.5 released? Will the PR from @ManasviGoyal fixing #3130 be included in that release?

Awkward 2.6.5 was released yesterday, and it includes #3115.

When @ManasviGoyal is done with #3130, I'll review it and merge it if there are no issues, and if it's useful to put that in a release, I'll do it. #3130 will need a new awkward-cpp, which takes more time and PyPI quota than a regular release, so they tend to be spaced out more, but if you need it, I'll make a release.

essoca commented 5 months ago

@jpivarski: many thanks for the instructions :+1: Being able to slice large data in CUDA (following numpy syntax) is a very important operation, in my opinion. It would be cool if you release it as soon as it is ready!