UHD Graphics 600 | Calling kernel + enqueue_copy more than once, results in OUT_OF_RESOURCES error or freeze

GiorgosXou commented 5 months ago

🦠 Describe the bug

Every time I call a kernel and enqueue_copy in a loop more than once, it either results in OUT_OF_RESOURCES-error or freeze^{^1}. This only happens with the GPU-option^{^2} and not when using the CPU-pocl-option^{^3} or when running it on another computer.

💥 To Reproduce

Steps to reproduce the behavior:

Have a UHD Graphics 600
Have intel-compute-runtime
Run a basic example like this one:

Expand to see the basic example

```python import numpy as np import pyopencl as cl # I can have 50000 instead of 5 and it will work just fine, but the momment I loop more than once ... error a_np = np.random.rand(5).astype(np.float32) b_np = np.random.rand(5).astype(np.float32) ctx = cl.create_some_context() queue = cl.CommandQueue(ctx) mf = cl.mem_flags a_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a_np) b_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b_np) prg = cl.Program(ctx, """ __kernel void sum( __global const float *a_g, __global const float *b_g, __global float *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid]; } """).build() res_g = cl.Buffer(ctx, mf.WRITE_ONLY, a_np.nbytes) knl = prg.sum # Use this Kernel object for repeated calls for i in range(0,1): # <--------- if range more than 2... issues res_np = np.empty_like(a_np) prg.sum(queue, a_np.shape, None, a_g, b_g, res_g) #.wait() cl.enqueue_copy(queue, res_np, res_g) #is_blocking=False) #.wait() # res_g.release() print(res_np - (a_np + b_np)) print(np.linalg.norm(res_np - (a_np + b_np))) assert np.allclose(res_np, a_np + b_np) ```

☝️ Expected behavior

Based on another, more advanced test, in another computer with intel graphics, it worked and didn't result into this error or freeze.So I guess it should work on this GPU too?

💻 Environment

OS: Arch Linux
ICD Loader and version: ocl-icd-2.3.2-1
ICD and version: pocl-4.0-2
CPU/GPU: Intel GeminiLake [UHD Graphics 600]
- Python version: Python 3.11.6
- PyOpenCL version: python-pyopencl 1:2023.1-2

➕ Additional context

I've also think I found a related issue here https://github.com/JPaulMora/Pyrit/issues/641 .

[1] (in some rare cases it might run more than once but max 2) [2] [0] <pyopencl.Platform 'Intel(R) OpenCL Graphics' at 0x5597a2db72a0> [3] [1] <pyopencl.Platform 'Portable Computing Language' at 0x7f6d966595d8>

Any Idea?

GiorgosXou commented 5 months ago

My guesses are: Either I understood something completly wrong about how something works, pyopencl issue or an issue with the drivers themeselves. I lean more to 3 or 1 but decided to post the issue here

inducer commented 5 months ago

Could you check if things work OK with OpenCL called from C, e.g. vec-demo from this repository? If so, then I'd be inclined to think about a Pyopencl bug, otherwise I would guess driver issue as well. Upgrading the kernel may help too.

GiorgosXou commented 5 months ago

Runs just fine with pocl-CPU but again fails GPU with the same type of error, here's the output:

➜  tools git:(master) ./cl-demo
Choose platform:
[0] Intel(R) Corporation
[1] The pocl project
Enter choice: 0
Choose device:
[0] Intel(R) UHD Graphics 600
Enter choice: 0
---------------------------------------------------------------------
NAME: Intel(R) UHD Graphics 600
VENDOR: Intel(R) Corporation
PROFILE: FULL_PROFILE
VERSION: OpenCL 3.0 NEO
EXTENSIONS: cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_ext_float_atomics cl_khr_external_memory cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
DRIVER_VERSION: 23.35.27191

Type: GPU
EXECUTION_CAPABILITIES: Kernel
GLOBAL_MEM_CACHE_TYPE: Read-Write (2)
CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
SINGLE_FP_CONFIG: 0xbf
QUEUE_PROPERTIES: 0x3

VENDOR_ID: 32902
MAX_COMPUTE_UNITS: 12
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_GROUP_SIZE: 256
PREFERRED_VECTOR_WIDTH_CHAR: 16
PREFERRED_VECTOR_WIDTH_SHORT: 8
PREFERRED_VECTOR_WIDTH_INT: 4
PREFERRED_VECTOR_WIDTH_LONG: 1
PREFERRED_VECTOR_WIDTH_FLOAT: 1
PREFERRED_VECTOR_WIDTH_DOUBLE: 1
MAX_CLOCK_FREQUENCY: 700
ADDRESS_BITS: 32
MAX_MEM_ALLOC_SIZE: 1717985280
IMAGE_SUPPORT: 1
MAX_READ_IMAGE_ARGS: 128
MAX_WRITE_IMAGE_ARGS: 128
IMAGE2D_MAX_WIDTH: 16384
IMAGE2D_MAX_HEIGHT: 16384
IMAGE3D_MAX_WIDTH: 16384
IMAGE3D_MAX_HEIGHT: 16384
IMAGE3D_MAX_DEPTH: 2048
MAX_SAMPLERS: 16
MAX_PARAMETER_SIZE: 2048
MEM_BASE_ADDR_ALIGN: 1024
MIN_DATA_TYPE_ALIGN_SIZE: 128
GLOBAL_MEM_CACHELINE_SIZE: 64
GLOBAL_MEM_CACHE_SIZE: 393216
GLOBAL_MEM_SIZE: 3435970560
MAX_CONSTANT_BUFFER_SIZE: 1717985280
MAX_CONSTANT_ARGS: 8
LOCAL_MEM_SIZE: 65536
ERROR_CORRECTION_SUPPORT: 0
PROFILING_TIMER_RESOLUTION: 52
ENDIAN_LITTLE: 1
AVAILABLE: 1
COMPILER_AVAILABLE: 1
MAX_WORK_GROUP_SIZES: 256 256 256
---------------------------------------------------------------------
*** 'clFinish' in 'cl-demo.c' on line 101 failed with error 'out of resources'.
[1]    50685 IOT instruction (core dumped)  ./cl-demo
➜  tools git:(master)

Now, where do I have to report this issue :P ... at least i'm feeling slightly well that my original code was fine (not the basic example) lol

inducer commented 5 months ago

https://github.com/intel/compute-runtime looks like a good starting point to report.

At any rate, not looking like a pyopencl issue.

inducer / pyopencl