intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.15k stars 233 forks source link

Need cl_khr_gl_sharing #166

Closed zhoub closed 1 year ago

zhoub commented 5 years ago

Hi

I found that this driver doesn't support cl_khr_gl_sharing, is there any plan for this ? Thanks !

PiotrRozenfeld commented 5 years ago

At this point this extension is supported only on Windows. There are currently no plans to implement this extension on Linux.

zhoub commented 5 years ago

Hi

This is actually important as Windows platform. So many video applications require the OpenCL and OpenGL interop, and now our work is based on Intel Up board. Please consider this. Thanks !

zhoub commented 5 years ago
Number of platforms                               3
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 1.2 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing 
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 1.2 beignet 1.3 (git-5aba95a)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
  Platform Extensions function suffix             Intel

  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.0.5
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 NEO 
  Driver Version                                  19.14.12751
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               18
  Max clock frequency                             750MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              32
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 1 / 1       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    32, Little-Endian
  Global memory size                              3435970560 (3.2GiB)
  Error Correction support                        No
  Max memory allocation                           1717985280 (1.6GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            107374080 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        1717985280 (1.6GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      52ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    SPIR versions                                 1.2 
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Motion Estimation accelerator version (Intel)   2
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing 

  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Broxton 0
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 beignet 1.3 (git-5aba95a)
  Driver Version                                  1.3
  Device OpenCL C Version                         OpenCL C 1.2 beignet 1.3 (git-5aba95a)
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               18
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None, None, None
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             512
  Preferred work group size multiple              16
  Preferred / native vector sizes                 
    char                                                16 / 8       
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               0 / 2        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              4102029312 (3.82GiB)
  Error Correction support                        No
  Max memory allocation                           3076521984 (2.865GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8192
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4096 bytes
    Pitch alignment for 2D image buffers          1 bytes
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             8192x8192x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        134217728 (128MiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp16

  Platform Name                                   Clover
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL HD Graphics
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
  clCreateContext(NULL, ...) [default]            Success [INTEL]
  clCreateContext(NULL, ...) [other]              Success [Intel]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Gen9 HD Graphics NEO
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel(R) OpenCL HD Graphics
    Device Name                                   Intel(R) Gen9 HD Graphics NEO

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
    NOTE:   your OpenCL library declares to support OpenCL 1.2,
        but it seems to support up to OpenCL 2.1 too.

In contrast, very funny.

The device Intel(R) Gen9 HD Graphics NEO has all video-related extensions but no OpenGL sharing, another device Intel(R) HD Graphics Broxton 0 has all graphics extensions but no VAAPI sharing. It means either 1 or 2, but no possible to use OpenGL to visualize the thing from 1, really sad.

juliusmh commented 5 years ago

Id really like to see the CL GL context sharing aswell :+1: .

lilly-lizard commented 5 years ago

same here, I'm really disappointed that I can't port my application to linux with intel cpus because of this. cl_khr_gl_sharing seems like such a key extension to have I'm surprised you're not supporting it.

lilly-lizard commented 5 years ago

I also tried using the beignet implementation and while it does support the cl_khr_gl_sharing extension, one of the functions I required, cl_mem_new_gl_buffer was empty but for a line of code: FATAL ("Not implemented") tears all round.

PiotrRozenfeld commented 5 years ago

Your arguments are convincing. We will fit this effort into our development schedule for the remainder of this year.

lilly-lizard commented 5 years ago

wow that's great to hear! your hard work is very much appreciated. best of luck to your team!

ormf commented 5 years ago

Hi. Yesterday my app seized working as it used cl_khr_gl_sharing with the beignet driver on linux and that seems to be disfunctional with my most recent update now. For my compositions it is absolutely crucial to work, so I'm quite devastated, as it is more or less impossible to implement my system on other Platforms/OSes (see: https://www.youtube.com/watch?v=2rWha1HTfFE&t=360s). I'd be extremely grateful if you'd implemented it and am absolutely willing to support funding if you let me know how!

ormf commented 4 years ago

@PiotrRozenfeld

We will fit this effort into our development schedule for the remainder of this year.

Are there any news about whether work has been done on this and whether/when it will be implemented?

AdamCetnerowski commented 4 years ago

This is being actively worked on. We don't have a clear trend date, but Q1 is very likely.

ormf commented 4 years ago

This is being actively worked on. We don't have a clear trend date, but Q1 is very likely.

@AdamCetnerowski: Thanks, very much appreciated!

pioto1225 commented 4 years ago

Has there been any progress on this issue?

smistad commented 4 years ago

I would also very much like to see this feature implemented

AdamCetnerowski commented 4 years ago

Some foundational work has been done to enable the extension. At the this time the Q1 trend is no longer valid. We intend to provide a partial implementation early Q2 for evaluation. We’re looking forward to your feedback when it’s available.

ormf commented 4 years ago

We’re looking forward to your feedback when it’s available.

Most definitely! Right now I'm holding back any upgrades on my system as I have to keep llvm 8.0.1 for the beignet drivers.

AdamCetnerowski commented 4 years ago

We have made progress towards this feature, but it was slower than initially expected. We’ll update this issue when we have something that is user-testable.”

Apahdos commented 4 years ago

Hi. Any news on when this will be released? Could use it for my thesis. Otherwise I need to find a workaround ^^

PiotrRozenfeld commented 4 years ago

While we did lay some groundwork for cl_khr_gl_sharing with MESA on Linux, we also had to increase our initial effort estimates, as additional work to be done in MESA and Compute before we can proceed with implementation. This does not mean we abandon the effort, but we can no longer accommodate that in the near future.

We'll post updates when the work will resume.

ormf commented 4 years ago

@PiotrRozenfeld thanks a lot for the info! Although this is very bad news for me, I now know that I'll either have to get beignet compiled with llvm 10, or resort to installing a dual boot until this is resolved. I sincerely hope this is not the final blow to this in compute-runtime.

Randrianasulu commented 4 years ago

Probably not most useful comment (I'm not a developer, just user whole likes to test something new) but I was looking at AMD's ROCM sources and found there

https://github.com/ROCm-Developer-Tools/ROCclr/blob/master/device/rocm/mesa_glinterop.h

/* Mesa OpenGL inter-driver interoperability interface designed for but not
 * limited to OpenCL.

Of course this is only smallest part of whole work .

mathucub commented 3 years ago

@PiotrRozenfeld any ETA of having a cl_khr_gl_sharing solution going? This extension missing causes a major problem since most of our linux machines are using Intel integrated GPUs.

chamois94 commented 3 years ago

While we did lay some groundwork for cl_khr_gl_sharing with MESA on Linux, we also had to increase our initial effort estimates, as additional work to be done in MESA and Compute before we can proceed with implementation. This does not mean we abandon the effort, but we can no longer accommodate that in the near future.

We'll post updates when the work will resume.

Any news on that ? This extension is really a must have for many applications.

smlehbleh commented 3 years ago

Also curious how this is going? We are building and deploying Intel NUC based video acquisition/processing solutions and the lack of this extension is the only reason we're still on Beignet drivers

PiotrRozenfeld commented 3 years ago

The work on this extension has stalled due to other work that our team is facing, unfortunately.

@smlehbleh - could you share a sample workload that you are looking to enable? What is your platform of interest? You are indicating this already works on Beignet - implying no additional work in MESA is needed.

smlehbleh commented 3 years ago

Hi Piotr, Our use case is an Intel NUC powered cinema camera/recording device. The RAW image processing (debayering and colour transformation etc...) is performed in OpenCL kernels and the user interface is rendered with OpenGL ES. We use Beignet OpenCL drivers which support the cl_khr_gl_sharing extension to share the processed RAW image from an OpenCL buffer with OpenGL ES to draw it to the monitor screen. We are using Ubuntu 18.04 with an Intel NUC8v7PNB and (soon) an NUC11TNBv7. The cl_khr_gl_sharing extension implementation in Beignet has been present for quite a while (Maybe a couple of years...). For an experiment we tried the NEO CL driver and replaced our 'cl_khr_gl_sharing' GL/CL interop code with a manual additional copy from a CL buffer into a GL texture - the total CPU usage appeared to go from about 10% to 15-20% and a noticeable frame delay was introduced. The buffer was a UHD RGBA32 image (~33mb).

smlehbleh commented 3 years ago

Hi @PiotrRozenfeld , I was just wondering if Beignet's existing cl_khr_gl_sharing support/implementation could be ported or copied over into this driver - or if it could generally reduce the amount of work as you mentioned?

JacekDanecki commented 3 years ago

As I can see in Beignet documentation cl_khr_gl_sharing is partially supported, and allow to create memory objects from OpenGL buffers or 2D textures. If I understand your use case correctly you're using cl/gl sharing differently, and you share OpenCL buffers with OpenGL, so memory for buffers is allocated by OpenCL not by OpenGL. Could you clarify?

smlehbleh commented 3 years ago

Hi Jacek, We are using the Beignet implementation of cl_khr_gl_sharing in the standard intended use case - we create an OpenGL texture and then create an OpenCL buffer from it using 'clCreateFromGLTexture'. I realise in the previous comment it seems like we're directly sharing an OpenCL buffer with OpenGL, but our currently active implementation is an OpenCL buffer created from an OpenGL texture (clCreateFromGLTexture). This allows us to draw the GL texture to the screen with OpenGL after an OpenCL kernel writes to the underlying buffer without any additional 'copy'.

ormf commented 3 years ago

Hi Jacek,

Am Mittwoch, den 01. September 2021 um 05:26:00 Uhr (-0700) schrieb Jacek Danecki:

As I can see in Beignet documentation cl_khr_gl_sharing is partially supported, and allow to create memory objects from OpenGL buffers or 2D textures.

If I understand your use case correctly you're using cl/gl sharing differently, and you share OpenCL buffers with OpenGL, so memory for buffers is allocated by OpenCL not by OpenGL. Could you clarify?

it's unclear whether you address me in your mail.

The way I used it with the Beignet driver was to first allocate the buffers in OpenGL (using "gen-buffer") and then "recreating" them in OpenCL (within the opengl context with the allocated OpenGL buffer mapped) by calling the OpenCL routine "create-buffer" with the USE-HOST-PTR flag.

Is that what you were asking?

Best, Orm

mathucub commented 3 years ago

I'm going to chime in, even though it has taken so long to get any traction that I've change jobs so no longer care. GL and CL are supposed to be able to interop with calls like clCreateFromGLTexture and stay in the GPU address space.

This allows for work to be done in CL, but presented in GL in realtime -- which any copy to host techniques will quash. This has been as baseline feature in nVidia and AMD's implementation for years.

The Intel stack on Linux does not match what the Intel stack on Windows can do, that is the baseline issue of what is wrong. They have to be in parity otherwise software is trapped on Windows since it can't be ported.

JacekDanecki commented 3 years ago

In Beignet code there is example: https://github.com/intel/beignet/blob/master/examples/gl_buffer_sharing/gl_buffer_sharing.cpp, and it looks similar to your usages. After small changes in Beignet code I was able to run this example on Ubuntu 18.04 and it worked correctly. Looking at sharing implementation in Beignet I can see it uses https://www.khronos.org/registry/EGL/extensions/MESA/EGL_MESA_image_dma_buf_export.txt, so I suppose it should be enough on mesa side to implement basic sharing in Neo.

smlehbleh commented 3 years ago

Hi @JacekDanecki, that's great and sounds promising! Would you happen to have a rough estimate for when it could get into a Neo release?

JacekDanecki commented 3 years ago

I've created new repository cl-gl-tests where I've pushed modified example from Beignet. I'll use this repo to test cl/gl implementation in Neo, so if you see any potential issues with this test in comparison to your usage scenario, please let me know. In first step I'll prepare some changes in Neo to run only tests from this repo. When I've any working Neo code I'll push it to my fork: https://github.com/JacekDanecki/compute-runtime on clgl branch.

JacekDanecki commented 3 years ago

With commit https://github.com/JacekDanecki/compute-runtime/commit/b95bc31fd7547cf5da7560bea822dca159c9e07d test gl_buffer_sharing from: https://github.com/JacekDanecki/cl-gl-tests is working now. This is still very experimental code, under development, there are some hard coded values, and some parts of code are stubbed. I've only executed gl_buffer_sharing test so far.

If you have any tests, you can share, I can focus on enabling them.

JacekDanecki commented 3 years ago

@smlehbleh @ormf any feedback on experimental code/test I've provided?

smlehbleh commented 2 years ago

Hi @JacekDanecki, sorry was pulled onto other things, but I'm back on this now. I'm not too familiar with building the necessary .deb packages from the compute-runtime repo. Are you able to enable CI on your forked repo to make some .deb packages show up in the releases pane? Then I'll give them a go with some gl_cl interop samples I've got. Cheers, Russell

JacekDanecki commented 2 years ago

What Ubuntu version are you using?

smlehbleh commented 2 years ago

Ubuntu 20.04.3 (kernel 5.13) on an 11th Gen Intel NUC

JacekDanecki commented 2 years ago

I'll rebase my current code to the latest Neo release and push to my fork, so you can use igc and gmmlib packages from release.

smlehbleh commented 2 years ago

Sounds great, thanks, let me know when to grab the packages

Abdob commented 2 years ago

"Right now, Intel does not support surface sharing on Linux* for example. Customer requests can drive changes to this decision." Last Updated: 12/15/2014

By Adam T Lake, Robert M Ioffe

https://www.intel.com/content/www/us/en/developer/articles/technical/sharing-surfaces-between-opencl-and-opengl-43-on-intel-processor-graphics-using-implicit.html

Is this still the case today?

smlehbleh commented 2 years ago

@Abdob, this is my understanding/experience:

It depends on the driver. Beignet OCL driver (1.3.2-6) supports the gl/cl sharing extension. The NEO Compute runtime OCL driver does not. However, @JacekDanecki has an experimental/wip branch with the feature using I think an approach similar to the Beignet driver.

Unfortunately the Beignet driver doesn't appear to support 11th Gen Intel GPUs (Xe) so for the latest Intel GPU hardware, cl/gl sharing support is not in any official driver release.

Abdob commented 2 years ago

Hi @smlehbleh

My intel GPU is: glxinfo | grep Mesa client glx vendor string: Mesa Project and SGI Device: Mesa Intel(R) UHD Graphics 630 (CML GT2) (0x9bc5) OpenGL renderer string: Mesa Intel(R) UHD Graphics 630 (CML GT2) OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.0.3 OpenGL version string: 4.6 (Compatibility Profile) Mesa 21.0.3 OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.0.3

I install beignet-opencl-icd and clinfo and I get this:

wave@wave:~$ clinfo Number of platforms 1 Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 2.0 beignet 1.3 Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short Platform Extensions function suffix Intel beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message)

Platform Name Intel Gen OCL Driver Number of devices 0

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] No platform beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware (If you have multiple ICDs installed and OpenCL works, you can ignore this message) clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform

Is cl_khr_gl_sharing suppose to be listed as a device extension?

Thanks, Abdo

JacekDanecki commented 2 years ago

@Abdob cl_khr_gl_sharing is listed on my setup with Beignet In case of CML support in Beignet see: https://bugs.launchpad.net/ubuntu/+source/beignet/+bug/1905340

glxinfo grep Mesa

client glx vendor string: Mesa Project and SGI
    Device: Mesa Intel(R) Iris(R) Pro Graphics 580 (SKL GT4) (0x193b)
OpenGL renderer string: Mesa Intel(R) Iris(R) Pro Graphics 580 (SKL GT4)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.0.3
OpenGL version string: 4.6 (Compatibility Profile) Mesa 21.0.3
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.0.3

clinfo | grep cl_khr_gl

  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_required_subgroup_size cl_intel_media_block_io cl_intel_planar_yuv cl_khr_gl_sharing
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_required_subgroup_size cl_intel_media_block_io cl_intel_planar_yuv cl_khr_gl_sharing cl_khr_fp16 cl_intel_device_side_avc_motion_estimation

clinfo

  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_required_subgroup_size cl_intel_media_block_io cl_intel_planar_yuv cl_khr_gl_sharing
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_required_subgroup_size cl_intel_media_block_io cl_intel_planar_yuv cl_khr_gl_sharing cl_khr_fp16 cl_intel_device_side_avc_motion_estimation
root@gubuntuPC:/home/jdanecki# cat clinfo.txt
Number of platforms                               1
  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 2.0 beignet 1.4 (git-419c0417)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_required_subgroup_size cl_intel_media_block_io cl_intel_planar_yuv cl_khr_gl_sharing
  Platform Extensions function suffix             Intel

  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Skylake Halo GT4
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.0 beignet 1.4 (git-419c0417)
  Driver Version                                  1.4
  Device OpenCL C Version                         OpenCL C 2.0 beignet 1.4 (git-419c0417)
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               72
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None, None, None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             256
  Preferred work group size multiple              16
  Sub-group sizes (Intel)                         8, 16
  Preferred / native vector sizes                 
    char                                                16 / 8       
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               0 / 2        (n/a)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              4294967296 (4GiB)
  Error Correction support                        No
  Max memory allocation                           3221225472 (3GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             65536 (64KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8192 (8KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4096 bytes
    Pitch alignment for 2D image buffers          1 pixels
    Max 2D image size                             8192x8192 pixels
    Max planar YUV image size                     8192x8192 pixels
    Max 3D image size                             8192x8192x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
    Max number of read/write image args           8
  Max number of pipe args                         16
  Max active pipe reservations                    1
  Max pipe packet size                            1024
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max number of constant args                     8
  Max constant buffer size                        134217728 (128MiB)
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                16384 (16KiB)
    Max size                                      262144 (256KiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_copy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_buffer_rect_align4;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_2d_to_buffer_align4;__cl_copy_image_2d_to_buffer_align16;__cl_copy_image_3d_to_buffer;__cl_copy_image_3d_to_buffer_align4;__cl_copy_image_3d_to_buffer_align16;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_2d_align4;__cl_copy_buffer_to_image_2d_align16;__cl_copy_buffer_to_image_3d;__cl_copy_buffer_to_image_3d_align4;__cl_copy_buffer_to_image_3d_align16;__cl_copy_image_1d_array_to_1d_array;__cl_copy_image_2d_array_to_2d_array;__cl_copy_image_2d_array_to_2d;__cl_copy_image_2d_array_to_3d;__cl_copy_image_2d_to_2d_array;__cl_copy_image_3d_to_2d_array;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
    Device-side AVC Motion Estimation version     <printDeviceInfo:165: get CL_DEVICE_AVC_ME_VERSION_INTEL : error -30>
      Supports texture sampler use                <printDeviceInfo:166: get CL_DEVICE_AVC_ME_SUPPORTS_TEXTURE_SAMPLER_USE_INTEL : error -30>
      Supports preemption                         <printDeviceInfo:167: get CL_DEVICE_AVC_ME_SUPPORTS_PREEMPTION_INTEL : error -30>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_required_subgroup_size cl_intel_media_block_io cl_intel_planar_yuv cl_khr_gl_sharing cl_khr_fp16 cl_intel_device_side_avc_motion_estimation

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel Gen OCL Driver
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [Intel]
  clCreateContext(NULL, ...) [default]            Success [Intel]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Skylake Halo GT4
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Skylake Halo GT4
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Skylake Halo GT4

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1
Abdob commented 2 years ago

Hi @JacekDanecki

Thank you for the follow up.

According to the link you provided beignet 1.3.2-8 fixed the issue. I installed 1.4 from source and it still doesn't show. We decided to move forward with another API and will no longer attempt to get the OpenCL interoperability to work.

wave@wave:~/Documents/ocl/beignet/build$ clinfo Number of platforms 1 Platform Name Intel Gen OCL Driver Platform Vendor Intel Platform Version OpenCL 2.0 beignet 1.4 (git-419c0417) Platform Profile FULL_PROFILE Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_intel_required_subgroup_size cl_intel_media_block_io cl_intel_planar_yuv Platform Extensions function suffix Intel cl_get_gt_device(): error, unknown device: 9bc5

Platform Name Intel Gen OCL Driver Number of devices 0

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] No platform cl_get_gt_device(): error, unknown device: 9bc5 clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform cl_get_gt_device(): error, unknown device: 9bc5 clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform cl_get_gt_device(): error, unknown device: 9bc5 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform

eero-t commented 2 years ago

According to the link you provided beignet 1.3.2-8 fixed the issue.

@Abdob That link was to Ubuntu bug tracker, and that fix was into a Ubuntu (and Debian) 1.3.2-8 package of beignet, not to beignet upstream project.

I installed 1.4 from source and it still doesn't show. ... Platform Version OpenCL 2.0 beignet 1.4 (git-419c0417) cl_get_gt_device(): error, unknown device: 9bc5

If you built it from upstream, it's missing CML support: https://github.com/intel/beignet/commits/master/src/cl_device_data.h

As the PR for that has not been merged: https://github.com/intel/beignet/pull/20

You would need to build e.g. the Debian version (with Debian tools, so that CML support patch gets applied on top of upstream sources): https://salsa.debian.org/opencl-team/beignet

Or just use the already built beignet packages from Ubuntu 21.04 or Debian testing (or newer distro version):

EDIT: added links to packages.

smlehbleh commented 2 years ago

Hi @JacekDanecki, I was wondering if you were able to build and generate the NEO compute runtime .deb files for Ubuntu 20.04from your fork with the experimental/wip cl/gl sharing?

JacekDanecki commented 2 years ago

@smlehbleh I've pushed current experimental implementation rebased on release https://github.com/intel/compute-runtime/releases/tag/22.06.22433 to clgl-fork branch in repository: https://github.com/JacekDanecki/compute-runtime/tree/clgl-fork You can use gmmlib and igc packages from Neo release.

Neo can be compiled with commands:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DDISABLE_WDDM_LINUX=1 -DBUILD_WITH_L0=1 -DNEO_DISABLE_LD_GOLD=1 ..
make  igdrcl_dll

Once compiled you can create file neo.icd in /etc/OpenCL/vendors directory containing full path to libigdrcl.so library.

JacekDanecki commented 2 years ago

I'm leaving compute team, so will not work on cl/gl sharing anymore.