inducer / pyopencl

OpenCL integration for Python, plus shiny features
http://mathema.tician.de/software/pyopencl
Other
1.04k stars 237 forks source link

Kernel evaluation throws exception for SVM objects #762

Closed kalocsaibotond closed 1 month ago

kalocsaibotond commented 1 month ago

Describe the bug If I use an SVM object for kernel argument, then instead of proper evaluation, the kernel throws an INVALID_KERNEL_ARGS exception.

To Reproduce Run this code:

import numpy as np
import pyopencl as cl

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
prg = cl.Program(
    ctx,
    r"""
    __kernel void twice(__global float *a) {a[get_global_id(0)] *= 2;}
    """,
).build()

ary = np.zeros(1000, np.float32)
prg.twice(queue, ary.shape, None, cl.SVM(ary))
queue.finish()  # synchronize
print(ary)  # access from host

Expected behavior I get the following error message for the Nvidia GPU:

Choose platform:
[0] <pyopencl.Platform 'NVIDIA CUDA' at 0x1b5529075d0>
[1] <pyopencl.Platform 'Intel(R) OpenCL HD Graphics' at 0x1b55292ba40>
Choice [0]:0
Set the environment variable PYOPENCL_CTX='0' to avoid being asked again.
---------------------------------------------------------------------------   
LogicError                                Traceback (most recent call last)   
Cell In[3], line 14
      6 prg = cl.Program(
      7     ctx,
      8     r"""
      9     __kernel void twice(__global float *a) {a[get_global_id(0)] *= 2;}
     10     """,
     11 ).build()
     13 ary = np.zeros(1000, np.float32)
---> 14 prg.twice(queue, ary.shape, None, cl.SVM(ary))
     15 queue.finish()  # synchronize
     16 print(ary)  # access from host

File ~\scoop\apps\anaconda3\current\App\envs\ctc_env\Lib\site-packages\pyopencl
\__init__.py:901, in _add_functionality.<locals>.kernel_call(self, queue, globa
l_size, local_size, *args, **kwargs)
    895 def kernel_call(self, queue, global_size, local_size, *args, **kwargs):
    896     # __call__ can't be overridden directly, so we need this
    897     # trampoline hack.
    898
    899     # Note: This is only used for the generic __call__, before
    900     # kernel_set_scalar_arg_dtypes is called.
--> 901     return self._enqueue(self, queue, global_size, local_size, *args, *
*kwargs)

File <pyopencl invoker for 'twice'>:8, in enqueue_knl_twice(self, queue, global
_size, local_size, arg0, global_offset, g_times_l, allow_empty_ndrange, wait_fo
r)

LogicError: clEnqueueNDRangeKernel failed: INVALID_KERNEL_ARGS

and this for the Intel integrated GPU:

Choose platform:
[0] <pyopencl.Platform 'NVIDIA CUDA' at 0x23787d266b0>
[1] <pyopencl.Platform 'Intel(R) OpenCL HD Graphics' at 0x23787bd1170>
Choice [0]:1
Set the environment variable PYOPENCL_CTX='1' to avoid being asked again.
---------------------------------------------------------------------------   
LogicError                                Traceback (most recent call last)   
Cell In[1], line 14
      6 prg = cl.Program(
      7     ctx,
      8     r"""
      9     __kernel void twice(__global float *a) {a[get_global_id(0)] *= 2;}
     10     """,
     11 ).build()
     13 ary = np.zeros(1000, np.float32)
---> 14 prg.twice(queue, ary.shape, None, cl.SVM(ary))
     15 queue.finish()  # synchronize
     16 print(ary)  # access from host

File ~\scoop\apps\anaconda3\current\App\envs\ctc_env\Lib\site-packages\pyopencl
\__init__.py:901, in _add_functionality.<locals>.kernel_call(self, queue, globa
l_size, local_size, *args, **kwargs)
    895 def kernel_call(self, queue, global_size, local_size, *args, **kwargs):
    896     # __call__ can't be overridden directly, so we need this
    897     # trampoline hack.
    898
    899     # Note: This is only used for the generic __call__, before
    900     # kernel_set_scalar_arg_dtypes is called.
--> 901     return self._enqueue(self, queue, global_size, local_size, *args, *
*kwargs)

File <pyopencl invoker for 'twice'>:7, in enqueue_knl_twice(self, queue, global
_size, local_size, arg0, global_offset, g_times_l, allow_empty_ndrange, wait_fo
r)

LogicError: clSetKernelArgSVMPointer failed: INVALID_ARG_VALUE - when processin
g arg#1 (1-based):

Environment (please complete the following information):

Additional context I aimed to reproduce the example in the doctring of SVM . I used anaconda to set upt the environment.

inducer commented 1 month ago

I can't reproduce this, your reproducer works flawlessly for me. IMO, the most likely reason for this is that your ICD does not support fine-grained system SVM. (In fact, I'm not aware of any GPU ones that currently do.) You may have better luck with buffer-style SVM. For Nvidia, you may want to try out Pocl's CUDA->OpenCL "wrapping" functionality that offers (buffer-style) SVM even when the Nvidia ICD does not. (Though I don't know about Windows support there.)