Open perrymacmurray opened 3 years ago
One day <3
Any update here?
Check for any update
Yes, OpenCL is a crucial feature. We're putting together a native Linux box for testing next week due to this.
This would be wonderful for my team. We have considered rewriting everything in cuda, but that has major downsides. Until OpenCL support is released, we are stuck dual-booting.
YEP
hope for any update
hope for any update
In theory OpenCL/WSL2 may now work for Intel Integrated Graphics GPUs: https://devblogs.microsoft.com/commandline/oneapi-l0-openvino-and-opencl-coming-to-the-windows-subsystem-for-linux-for-intel-gpus/
Trying a few days ago, I didn't see any CPU platforms get registered (I am on AMD for CPU) nor any GPU (I am on Nvidia for GPU)
any info about whether NVIDIA GPU computing is planned to be supported for OpenCL?
Any new info a year later?
Any new info a year later?
Better late than never, right?
Still waiting
Still waiting
same issue when trying to run a boost example program
terminate called after throwing an instance of 'boost::wrapexcept<boost::compute::no_device_found>'
what(): No OpenCL device found
I should have checked this before wasting a whole day trying to get it to work ... still waiting
I should have checked this before wasting a whole day trying to get it to work ... still waiting
same
I found the solution, as it is going to be usual from now on, by asking ChatGPT. To set up OpenCL on WSL, you can follow these general steps:
sudo apt-get install ocl-icd-opencl-dev
sudo apt-get install pocl-opencl-icd
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH
After completing these steps, you should be able to use OpenCL on WSL. Note that the specific steps and packages required may vary depending on the Linux distribution, GPU hardware, and OpenCL implementation you are using.
@jorgevazquezperez is that proven working or a hallucination?
Proven working. It is needed to note that I have only achieved it with the CPU, but I am in process to be able to do it with the GPU. I attach you a picture with the results and I will keep you updated with the GPU version (as I imagine that it is the one you all are looking forward to). If you need more info just tell me!
PD: I am using python with the pyopencl library.
Yes, afaict CPU and integrated Intel GPU should work, but unclear if/how Nvidia
I have tried but failed. My guess is that drivers and versions (of Windows, WSL and packages) have to be aligned in order to make it work. And even so, I am not sure it would 100% work, so I am going to leave that tbd.
Hallucination ;-) Sorry to hear.. the saga continues
Pray for update
Tried on a machine with both intel CPU/GPU and an NVIDIA GPU. Works well if you want to only use the intel CPU/GPU. It kills CUDA support, even if it is installed/reinstalled properly. Can be usefull on a machine that has nothing but an intel CPU/GPU. Waiting for a solution where both opencl and cuda would work....
Also hoping for an update here. Many companies in the 3D animation industry would benefit a lot from running OpenCL-dependent programs in WSL on Nvidia/AMD hardware
Date: Nov 4, 2021 https://devblogs.microsoft.com/commandline/oneapi-l0-openvino-and-opencl-coming-to-the-windows-subsystem-for-linux-for-intel-gpus/
extra helpful info:
user@WSL2:~$ sudo clinfo
Number of platforms 3
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
Platform Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_intel_command_queue_families 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_intel_subgroups_short 0x400000 (1.0.0)
cl_khr_spir 0x400000 (1.0.0)
cl_intel_accelerator 0x400000 (1.0.0)
cl_intel_driver_diagnostics 0x400000 (1.0.0)
cl_khr_priority_hints 0x400000 (1.0.0)
cl_khr_throttle_hints 0x400000 (1.0.0)
cl_khr_create_command_queue 0x400000 (1.0.0)
cl_intel_subgroups_char 0x400000 (1.0.0)
cl_intel_subgroups_long 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_mem_force_host_memory 0x400000 (1.0.0)
cl_khr_subgroup_extended_types 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0)
cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0)
cl_intel_device_attribute_query 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_khr_subgroups 0x400000 (1.0.0)
cl_intel_spirv_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_spirv_media_block_io 0x400000 (1.0.0)
cl_intel_spirv_subgroups 0x400000 (1.0.0)
cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0)
cl_intel_unified_shared_memory_preview 0x400000 (1.0.0)
cl_khr_mipmap_image 0x400000 (1.0.0)
cl_khr_mipmap_image_writes 0x400000 (1.0.0)
cl_intel_planar_yuv 0x400000 (1.0.0)
cl_intel_packed_yuv 0x400000 (1.0.0)
cl_intel_motion_estimation 0x400000 (1.0.0)
cl_intel_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_advanced_motion_estimation 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_intel_media_block_io 0x400000 (1.0.0)
cl_intel_va_api_media_sharing 0x400000 (1.0.0)
cl_intel_sharing_format_query 0x400000 (1.0.0)
cl_khr_pci_bus_info 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix INTEL
Platform Host timer resolution 1ns
Platform Name Intel(R) OpenCL HD Graphics
Number of devices 1
Device Name Intel(R) Graphics [0x5917]
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 21.35.20826
Device OpenCL C Version OpenCL C 3.0
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0x800000 (2.0.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_device_enqueue 0xc00000 (3.0.0)
__opencl_c_pipes 0xc00000 (3.0.0)
__opencl_c_fp64 0xc00000 (3.0.0)
Latest comfornace test passed v2021-06-16-00
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 24
Max clock frequency 1150MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple (device) 32
Preferred work group size multiple (kernel) 32
Max sub-groups per work group 32
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 5101215744 (4.751GiB)
Error Correction support No
Max memory allocation 1073741824 (1024MiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 1073741824 (1024MiB)
Global Memory cache type Read/Write
Global Memory cache size 524288 (512KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 67108864 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 pixels
Max 2D image size 16384x16384 pixels
Max planar YUV image size 16384x16352 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Pipe support Yes
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Local
Local memory size 65536 (64KiB)
Max number of constant args 8
Max constant buffer size 1073741824 (1024MiB)
Generic address space support Yes
Max size of kernel argument 2048 (2KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Device enqueue capabilities supported, replaceable default queue
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 131072 (128KiB)
Max size 67108864 (64MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Profiling timer resolution 83ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Non-uniform work-groups Yes
Work-group collective functions Yes
Sub-group independent forward progress Yes
IL version SPIR-V_1.2
ILs with version SPIR-V 0x402000 (1.2.0)
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Built-in kernels with version block_motion_estimate_intel 0x400000 (1.0.0)
block_advanced_motion_estimate_check_intel 0x400000 (1.0.0)
block_advanced_motion_estimate_bidirectional_check_intel 0x400000 (1.0.0)
Motion Estimation accelerator version (Intel) 2
Device-side AVC Motion Estimation version 1
Supports texture sampler use Yes
Supports preemption No
Device Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_intel_command_queue_families 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_intel_subgroups_short 0x400000 (1.0.0)
cl_khr_spir 0x400000 (1.0.0)
cl_intel_accelerator 0x400000 (1.0.0)
cl_intel_driver_diagnostics 0x400000 (1.0.0)
cl_khr_priority_hints 0x400000 (1.0.0)
cl_khr_throttle_hints 0x400000 (1.0.0)
cl_khr_create_command_queue 0x400000 (1.0.0)
cl_intel_subgroups_char 0x400000 (1.0.0)
cl_intel_subgroups_long 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_mem_force_host_memory 0x400000 (1.0.0)
cl_khr_subgroup_extended_types 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0)
cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0)
cl_intel_device_attribute_query 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_khr_subgroups 0x400000 (1.0.0)
cl_intel_spirv_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_spirv_media_block_io 0x400000 (1.0.0)
cl_intel_spirv_subgroups 0x400000 (1.0.0)
cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0)
cl_intel_unified_shared_memory_preview 0x400000 (1.0.0)
cl_khr_mipmap_image 0x400000 (1.0.0)
cl_khr_mipmap_image_writes 0x400000 (1.0.0)
cl_intel_planar_yuv 0x400000 (1.0.0)
cl_intel_packed_yuv 0x400000 (1.0.0)
cl_intel_motion_estimation 0x400000 (1.0.0)
cl_intel_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_advanced_motion_estimation 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_intel_media_block_io 0x400000 (1.0.0)
cl_intel_va_api_media_sharing 0x400000 (1.0.0)
cl_intel_sharing_format_query 0x400000 (1.0.0)
cl_khr_pci_bus_info 0x400000 (1.0.0)
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL HD Graphics
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL]
clCreateContext(NULL, ...) [default] Success [INTEL]
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Graphics [0x5917]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Graphics [0x5917]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Graphics [0x5917]
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.3.1
ICD loader Profile OpenCL 3.0
user@WSL2:~$ sudo hashcat -I
hashcat (v6.2.5) starting in backend information mode
clGetDeviceIDs(): CL_DEVICE_NOT_FOUND
clGetDeviceIDs(): CL_DEVICE_NOT_FOUND
OpenCL Info:
============
OpenCL Platform ID #1
Vendor..: Intel(R) Corporation
Name....: Intel(R) OpenCL HD Graphics
Version.: OpenCL 3.0
Backend Device ID #1
Type...........: GPU
Vendor.ID......: 8
Vendor.........: Intel(R) Corporation
Name...........: Intel(R) Graphics [0x5917]
Version........: OpenCL 3.0 NEO
Processor(s)...: 24
Clock..........: 1150
Memory.Total...: 4864 MB (limited to 512 MB allocatable in one block)
Memory.Free....: 2400 MB
OpenCL.Version.: OpenCL C 3.0
Driver.Version.: 21.35.20826
OpenCL Platform ID #2
Vendor..: The pocl project
Name....: Portable Computing Language
Version.: OpenCL 2.0 pocl 1.8 Linux, None+Asserts, RELOC, LLVM 11.1.0, SLEEF, DISTRO, POCL_DEBUG
Backend Device ID #2
Type...........: CPU
Vendor.ID......: 128
Vendor.........: GenuineIntel
Name...........: pthread-Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
Version........: OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-skylake
Processor(s)...: 8
Clock..........: 2111
Memory.Total...: 4399 MB (limited to 1024 MB allocatable in one block)
Memory.Free....: 2167 MB
OpenCL.Version.: OpenCL C 1.2 pocl
Driver.Version.: 1.8
OpenCL Platform ID #3
Vendor..: Mesa
Name....: Clover
Version.: OpenCL 1.1 Mesa 22.2.5
Is not a myth. Hope this clear things up for everyone.
Any news?
Any news?
You can get OpenCL in WSL2 now. https://github.com/intel/compute-runtime Make sure you have latest Intel Driver running on host Windows PC. https://www.intel.com/content/www/us/en/developer/articles/release-notes/opencl-runtime-release-notes.html
Or you can configure POCL to make it to work. https://github.com/gyferlim/pocl
We want a general solution, not support for specific intel igpu (funny thing is that ARC doesn't even seems like in the list) or cpu runtime. That is any opencl compatible gpu on windows host should also be seen in WSL.
Just like any operating system, including Windows, you need to configure and install OpenCL in order for it to work. It doesn't come preinstalled. My last two posts clearly illustrate the following points:
OpenCL is available in WSL2. I don't understand the problem you're facing since there are solutions available.
The very reason this issue thread exists is that CUDA and AMD runtimes for OpenCL do not work on WSL2 neither natively nor via POCL. We already know intel platform has been supported for one and a half years, and it has been discussed multiple times already in this thread, but this does not work for either Nvidia or AMD GPUs.
And being more clear in that - when the Nvidia/amd hw works on native Linux partition, so wsl2 is the missing part
Quote: "With Microsoft Windows Subsystem for Linux 2 (WSL 2), you can use native Linux distribution of Intel® oneAPI tools and libraries on Windows." https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html
Quote:"The latest NVIDIA Windows GPU Driver will fully support WSL 2. With CUDA support in the driver, existing applications (compiled elsewhere on a Linux system for the same target GPU) can run unmodified within the WSL environment." https://docs.nvidia.com/cuda/wsl-user-guide/index.html
AMD is a little tricky but will also support OpenCL in WSL2 with ROCM/HIP stack. Unfortunately, I am not able to try it myself to give a confirm answer hence I have no reference to show.
And other people experience with OpenCL in WSL2: https://m-kim.hashnode.dev/opencl-on-a-gpu-with-wsl2
So, you now have CUDA, Intel, and POCL option for OpenCL in WSL2. I hope this would help.
@gyferlim Have you even read the things you're linking?
CUDA has been supported on wsl2 for a long time, but that is not what this issue is about, the Nvidia OpenCL platform does not work and neither does POCL with CUDA devices. And there is nothing suggesting otherwise in any of your links.
In the link for instructions for ROCM and AMD the first reply is someone literally posted a list of the errors that the guide spits out because:
This is the entire list of failures because ROCm is not supported on WSL2, or WSL in general for this matter.
And the link about someone using OpenCL on WSL2 is using the integrated intel GPU, which we already know works. So no, I am afraid it does not help.
It's about getting Nvidia OpenCL to work in WSL2, not CUDA Runtime or AMD HIP/ROCM, is that correct? Please accept my apology if I misunderstood earlier.
I remember having CUDA and AMD running as OpenCL in WSL2 in the past. However, due to driver changes, I had to reinstall Nvidia, which resulted in it breaking. Below is my clinfo with the AMD platform running. Unfortunately, both my RX470 and RX580 died, and I don't have another AMD card to try and show the result.
Number of platforms 2
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3581.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 3.0 PoCL 4.1-pre main-0-ga3e43d58 Linux, Debug+Asserts, RELOC, SPIR, LLVM 12.0.0, SLEEF, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_pocl_content_size
Platform Host timer resolution 0ns
Platform Extensions function suffix POCL
Platform Name AMD Accelerated Parallel Processing
Number of devices 0
Platform Name Portable Computing Language
Number of devices 1
Device Name cpu-Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Device Vendor GenuineIntel
Device Vendor ID 0x6c636f70
Device Version OpenCL 1.2 PoCL HSTR: cpu-x86_64-pc-linux-gnu-haswell
Driver Version 4.1-pre main-0-ga3e43d58
Device OpenCL C Version OpenCL C 1.2 PoCL
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 8
Max clock frequency 3392MHz
Device Partition (core)
Max number of sub-devices 8
Supported partition types equally, by counts
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Preferred work group size multiple 8
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 8 / 8
long 4 / 4
half 0 / 0 (n/a)
float 8 / 8
double 4 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 5802754048 (5.404GiB)
Error Correction support No
Max memory allocation 2147483648 (2GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 8388608 (8MiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Local memory type Global
Local memory size 262144 (256KiB)
Max number of constant args 8
Max constant buffer size 262144 (256KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions (n/a)
printf() buffer size 16777216 (16MiB)
Built-in kernels pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_command_buffer cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform
NOTE: your OpenCL library only supports OpenCL 2.2,
but some installed platforms support OpenCL 3.0.
Programs using 3.0 features may crash
or behave unexpectedly
Yes, that is still not showing Nvidia <> opencl <> wsl2
Since NVIDIA OpenCL ICD is built on top of CUDA, it's a bit hard to understand why OpenCL/NVidia is not supported under WSL2 when CUDA is functional. Clearly it's not a technical issue but a commercial issue. Please correct me if i'm wrong.
I manage to get Intel OpenCL working in WSL2, I think.
Follow the instructions given here : https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html#UBUNTU-22-04-JAMMY
create a file name "intel.icd" in /etc/OpenCL/vendors , with
/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
Number of platforms 2
Platform Name Intel(R) OpenCL Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_spirv_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
Platform Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_device_uuid 0x400000 (1.0.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_intel_command_queue_families 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_intel_subgroups_short 0x400000 (1.0.0)
cl_khr_spir 0x400000 (1.0.0)
cl_intel_accelerator 0x400000 (1.0.0)
cl_intel_driver_diagnostics 0x400000 (1.0.0)
cl_khr_priority_hints 0x400000 (1.0.0)
cl_khr_throttle_hints 0x400000 (1.0.0)
cl_khr_create_command_queue 0x400000 (1.0.0)
cl_intel_subgroups_char 0x400000 (1.0.0)
cl_intel_subgroups_long 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_mem_force_host_memory 0x400000 (1.0.0)
cl_khr_subgroup_extended_types 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0)
cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0)
cl_intel_device_attribute_query 0x400000 (1.0.0)
cl_khr_suggested_local_work_size 0x400000 (1.0.0)
cl_intel_split_work_group_barrier 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_khr_subgroups 0x400000 (1.0.0)
cl_intel_spirv_media_block_io 0x400000 (1.0.0)
cl_intel_spirv_subgroups 0x400000 (1.0.0)
cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0)
cl_intel_unified_shared_memory 0x400000 (1.0.0)
cl_khr_mipmap_image 0x400000 (1.0.0)
cl_khr_mipmap_image_writes 0x400000 (1.0.0)
cl_ext_float_atomics 0x400000 (1.0.0)
cl_intel_planar_yuv 0x400000 (1.0.0)
cl_intel_packed_yuv 0x400000 (1.0.0)
cl_intel_motion_estimation 0x400000 (1.0.0)
cl_intel_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_spirv_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_advanced_motion_estimation 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_intel_media_block_io 0x400000 (1.0.0)
cl_khr_gl_sharing 0x400000 (1.0.0)
cl_khr_gl_depth_images 0x400000 (1.0.0)
cl_khr_gl_event 0x400000 (1.0.0)
cl_khr_gl_msaa_sharing 0x400000 (1.0.0)
cl_intel_va_api_media_sharing 0x400000 (1.0.0)
cl_intel_sharing_format_query 0x400000 (1.0.0)
cl_khr_pci_bus_info 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix INTEL
Platform Host timer resolution 1ns
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 3.0 PoCL 4.1-pre main-0-ga3e43d58 Linux, Debug+Asserts, RELOC, SPIR, LLVM 14.0.0, SLEEF, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_pocl_content_size
Platform Extensions with Version cl_khr_icd 0x400000 (1.0.0)
cl_pocl_content_size 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix POCL
Platform Host timer resolution 0ns
Platform Name Intel(R) OpenCL Graphics
Number of devices 1
Device Name Intel(R) Graphics [0x5917]
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 3.0 NEO
Device UUID 86801759-0700-0000-0002-000000000000
Driver UUID 32332e32-322e-3236-3531-362e31380000
Valid Device LUID No
Device LUID 5017-c9c1fd7f0000
Device Node Mask 0
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 23.22.26516.18
Device OpenCL C Version OpenCL C 1.2
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_work_group_collective_functions 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_pipes 0xc00000 (3.0.0)
__opencl_c_fp64 0xc00000 (3.0.0)
Latest comfornace test passed v2022-04-22-00
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 24
Max clock frequency 1150MHz
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple (device) 32
Preferred work group size multiple (kernel) 32
Max sub-groups per work group 32
Sub-group sizes (Intel) 8, 16, 32
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 1 / 1
half 8 / 8 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 5101215744 (4.751GiB)
Error Correction support No
Max memory allocation 1073741824 (1024MiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 1073741824 (1024MiB)
Global Memory cache type Read/Write
Global Memory cache size 786432 (768KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 67108864 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 4 bytes
Pitch alignment for 2D image buffers 4 pixels
Max 2D image size 16384x16384 pixels
Max planar YUV image size 16384x16352 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Pipe support Yes
Max number of pipe args 16
Max active pipe reservations 1
Max pipe packet size 1024
Local memory type Local
Local memory size 65536 (64KiB)
Max number of constant args 8
Max constant buffer size 1073741824 (1024MiB)
Generic address space support Yes
Max size of kernel argument 2048 (2KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Device enqueue capabilities (n/a)
Queue properties (on device)
Out-of-order execution No
Profiling No
Preferred size 0
Max size 0
Max queues on device 0
Max events on device 0
Prefer user sync for interop Yes
Profiling timer resolution 83ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Non-uniform work-groups Yes
Work-group collective functions Yes
Sub-group independent forward progress Yes
IL version SPIR-V_1.2
ILs with version SPIR-V 0x402000 (1.2.0)
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Built-in kernels with version block_motion_estimate_intel 0x400000 (1.0.0)
block_advanced_motion_estimate_check_intel 0x400000 (1.0.0)
block_advanced_motion_estimate_bidirectional_check_intel 0x400000 (1.0.0)
Motion Estimation accelerator version (Intel) 2
Device-side AVC Motion Estimation version 1
Supports texture sampler use Yes
Supports preemption No
Device Extensions cl_khr_byte_addressable_store cl_khr_device_uuid cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_suggested_local_work_size cl_intel_split_work_group_barrier cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_ext_float_atomics cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_spirv_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info
Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_device_uuid 0x400000 (1.0.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_icd 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_intel_command_queue_families 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_intel_subgroups_short 0x400000 (1.0.0)
cl_khr_spir 0x400000 (1.0.0)
cl_intel_accelerator 0x400000 (1.0.0)
cl_intel_driver_diagnostics 0x400000 (1.0.0)
cl_khr_priority_hints 0x400000 (1.0.0)
cl_khr_throttle_hints 0x400000 (1.0.0)
cl_khr_create_command_queue 0x400000 (1.0.0)
cl_intel_subgroups_char 0x400000 (1.0.0)
cl_intel_subgroups_long 0x400000 (1.0.0)
cl_khr_il_program 0x400000 (1.0.0)
cl_intel_mem_force_host_memory 0x400000 (1.0.0)
cl_khr_subgroup_extended_types 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)
cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0)
cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0)
cl_intel_device_attribute_query 0x400000 (1.0.0)
cl_khr_suggested_local_work_size 0x400000 (1.0.0)
cl_intel_split_work_group_barrier 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_khr_subgroups 0x400000 (1.0.0)
cl_intel_spirv_media_block_io 0x400000 (1.0.0)
cl_intel_spirv_subgroups 0x400000 (1.0.0)
cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0)
cl_intel_unified_shared_memory 0x400000 (1.0.0)
cl_khr_mipmap_image 0x400000 (1.0.0)
cl_khr_mipmap_image_writes 0x400000 (1.0.0)
cl_ext_float_atomics 0x400000 (1.0.0)
cl_intel_planar_yuv 0x400000 (1.0.0)
cl_intel_packed_yuv 0x400000 (1.0.0)
cl_intel_motion_estimation 0x400000 (1.0.0)
cl_intel_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_spirv_device_side_avc_motion_estimation 0x400000 (1.0.0)
cl_intel_advanced_motion_estimation 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_khr_image2d_from_buffer 0x400000 (1.0.0)
cl_khr_depth_images 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_intel_media_block_io 0x400000 (1.0.0)
cl_khr_gl_sharing 0x400000 (1.0.0)
cl_khr_gl_depth_images 0x400000 (1.0.0)
cl_khr_gl_event 0x400000 (1.0.0)
cl_khr_gl_msaa_sharing 0x400000 (1.0.0)
cl_intel_va_api_media_sharing 0x400000 (1.0.0)
cl_intel_sharing_format_query 0x400000 (1.0.0)
cl_khr_pci_bus_info 0x400000 (1.0.0)
Platform Name Portable Computing Language
Number of devices 1
Device Name cpu-Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
Device Vendor GenuineIntel
Device Vendor ID 0x6c636f70
Device Version OpenCL 3.0 PoCL HSTR: cpu-x86_64-pc-linux-gnu-skylake
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 4.1-pre main-0-ga3e43d58
Device OpenCL C Version OpenCL C 1.2 PoCL
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_3d_image_writes 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_subgroups 0xc00000 (3.0.0)
__opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0)
__opencl_c_read_write_images 0xc00000 (3.0.0)
__opencl_c_fp64 0xc00000 (3.0.0)
__opencl_c_int64 0xc00000 (3.0.0)
Latest comfornace test passed v2022-04-19-01
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 8
Max clock frequency 2111MHz
Device Partition (core)
Max number of sub-devices 8
Supported partition types equally, by counts
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Preferred work group size multiple (device) 8
Preferred work group size multiple (kernel) 8
Max sub-groups per work group 128
Sub-group sizes (Intel) 1, 2, 4, 8, 16, 32, 64, 128, 256, 512
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 8 / 8
long 4 / 4
half 0 / 0 (n/a)
float 8 / 8
double 4 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 4613160960 (4.296GiB)
Error Correction support No
Max memory allocation 2147483648 (2GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope
Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope
Max size for global variable 64000 (62.5KiB)
Preferred total size of global vars 262144 (256KiB)
Global Memory cache type Read/Write
Global Memory cache size 8388608 (8MiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Max number of read/write image args 128
Pipe support No
Max number of pipe args 0
Max active pipe reservations 0
Max pipe packet size 0
Local memory type Global
Local memory size 262144 (256KiB)
Max number of constant args 8
Max constant buffer size 262144 (256KiB)
Generic address space support Yes
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Device enqueue capabilities (n/a)
Queue properties (on device)
Out-of-order execution No
Profiling No
Preferred size 0
Max size 0
Max queues on device 0
Max events on device 0
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
Non-uniform work-groups No
Work-group collective functions No
Sub-group independent forward progress Yes
IL version (n/a)
ILs with version (n/a)
SPIR versions (n/a)
printf() buffer size 16777216 (16MiB)
Built-in kernels pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
Built-in kernels with version pocl.add.i8 0x402000 (1.2.0)
org.khronos.openvx.scale_image.nn.u8 0x402000 (1.2.0)
org.khronos.openvx.scale_image.bl.u8 0x402000 (1.2.0)
org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 0x402000 (1.2.0)
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_command_buffer cl_khr_subgroups cl_intel_unified_shared_memory cl_khr_subgroup_ballot cl_khr_subgroup_shuffle cl_intel_subgroups cl_intel_required_subgroup_size cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_3d_image_writes 0x400000 (1.0.0)
cl_khr_command_buffer 0x9000 (0.9.0)
cl_khr_subgroups 0x400000 (1.0.0)
cl_intel_unified_shared_memory 0x400000 (1.0.0)
cl_khr_subgroup_ballot 0x400000 (1.0.0)
cl_khr_subgroup_shuffle 0x400000 (1.0.0)
cl_intel_subgroups 0x400000 (1.0.0)
cl_intel_required_subgroup_size 0x400000 (1.0.0)
cl_khr_spir 0x801000 (2.1.0)
cl_khr_fp64 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL Graphics
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL]
clCreateContext(NULL, ...) [default] Success [INTEL]
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Intel(R) OpenCL Graphics
Device Name Intel(R) Graphics [0x5917]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel(R) OpenCL Graphics
Device Name Intel(R) Graphics [0x5917]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel(R) OpenCL Graphics
Device Name Intel(R) Graphics [0x5917]
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.3.1
ICD loader Profile OpenCL 3.0
This issue is about Nvidia cards not being shown, not intel/amd
This issue is about Nvidia cards not being shown, not intel/amd
The title of the issue is simply "No OpenCL platforms reported" - not "No NVidia OpenCL platforms reported"
This issue is about Nvidia cards not being shown, not intel/amd
The title of the issue is simply "No OpenCL platforms reported" - not "No NVidia OpenCL platforms reported"
The issue is about Nvidia cards not being shown.
Date: Nov 4, 2021 https://devblogs.microsoft.com/commandline/oneapi-l0-openvino-and-opencl-coming-to-the-windows-subsystem-for-linux-for-intel-gpus/
extra helpful info:
- https://www.intel.com/content/www/us/en/artificial-intelligence/harness-the-power-of-intel-igpu-on-your-machine.html
- https://github.com/intel/compute-runtime/releases/tag/21.35.20826
user@WSL2:~$ sudo clinfo Number of platforms 3 Platform Name Intel(R) OpenCL HD Graphics Platform Vendor Intel(R) Corporation Platform Version OpenCL 3.0 Platform Profile FULL_PROFILE Platform Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info Platform Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_icd 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_intel_command_queue_families 0x400000 (1.0.0) cl_intel_subgroups 0x400000 (1.0.0) cl_intel_required_subgroup_size 0x400000 (1.0.0) cl_intel_subgroups_short 0x400000 (1.0.0) cl_khr_spir 0x400000 (1.0.0) cl_intel_accelerator 0x400000 (1.0.0) cl_intel_driver_diagnostics 0x400000 (1.0.0) cl_khr_priority_hints 0x400000 (1.0.0) cl_khr_throttle_hints 0x400000 (1.0.0) cl_khr_create_command_queue 0x400000 (1.0.0) cl_intel_subgroups_char 0x400000 (1.0.0) cl_intel_subgroups_long 0x400000 (1.0.0) cl_khr_il_program 0x400000 (1.0.0) cl_intel_mem_force_host_memory 0x400000 (1.0.0) cl_khr_subgroup_extended_types 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0) cl_khr_subgroup_ballot 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0) cl_khr_subgroup_shuffle 0x400000 (1.0.0) cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0) cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0) cl_intel_device_attribute_query 0x400000 (1.0.0) cl_khr_fp64 0x400000 (1.0.0) cl_khr_subgroups 0x400000 (1.0.0) cl_intel_spirv_device_side_avc_motion_estimation 0x400000 (1.0.0) cl_intel_spirv_media_block_io 0x400000 (1.0.0) cl_intel_spirv_subgroups 0x400000 (1.0.0) cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0) cl_intel_unified_shared_memory_preview 0x400000 (1.0.0) cl_khr_mipmap_image 0x400000 (1.0.0) cl_khr_mipmap_image_writes 0x400000 (1.0.0) cl_intel_planar_yuv 0x400000 (1.0.0) cl_intel_packed_yuv 0x400000 (1.0.0) cl_intel_motion_estimation 0x400000 (1.0.0) cl_intel_device_side_avc_motion_estimation 0x400000 (1.0.0) cl_intel_advanced_motion_estimation 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_khr_image2d_from_buffer 0x400000 (1.0.0) cl_khr_depth_images 0x400000 (1.0.0) cl_khr_3d_image_writes 0x400000 (1.0.0) cl_intel_media_block_io 0x400000 (1.0.0) cl_intel_va_api_media_sharing 0x400000 (1.0.0) cl_intel_sharing_format_query 0x400000 (1.0.0) cl_khr_pci_bus_info 0x400000 (1.0.0) Platform Numeric Version 0xc00000 (3.0.0) Platform Extensions function suffix INTEL Platform Host timer resolution 1ns Platform Name Intel(R) OpenCL HD Graphics Number of devices 1 Device Name Intel(R) Graphics [0x5917] Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 3.0 NEO Device Numeric Version 0xc00000 (3.0.0) Driver Version 21.35.20826 Device OpenCL C Version OpenCL C 3.0 Device OpenCL C all versions OpenCL C 0x400000 (1.0.0) OpenCL C 0x401000 (1.1.0) OpenCL C 0x402000 (1.2.0) OpenCL C 0x800000 (2.0.0) OpenCL C 0xc00000 (3.0.0) Device OpenCL C features __opencl_c_int64 0xc00000 (3.0.0) __opencl_c_3d_image_writes 0xc00000 (3.0.0) __opencl_c_images 0xc00000 (3.0.0) __opencl_c_read_write_images 0xc00000 (3.0.0) __opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0) __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0) __opencl_c_atomic_scope_all_devices 0xc00000 (3.0.0) __opencl_c_atomic_scope_device 0xc00000 (3.0.0) __opencl_c_generic_address_space 0xc00000 (3.0.0) __opencl_c_program_scope_global_variables 0xc00000 (3.0.0) __opencl_c_work_group_collective_functions 0xc00000 (3.0.0) __opencl_c_subgroups 0xc00000 (3.0.0) __opencl_c_device_enqueue 0xc00000 (3.0.0) __opencl_c_pipes 0xc00000 (3.0.0) __opencl_c_fp64 0xc00000 (3.0.0) Latest comfornace test passed v2021-06-16-00 Device Type GPU Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 24 Max clock frequency 1150MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple (device) 32 Preferred work group size multiple (kernel) 32 Max sub-groups per work group 32 Sub-group sizes (Intel) 8, 16, 32 Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 1 / 1 half 8 / 8 (cl_khr_fp16) float 1 / 1 double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 5101215744 (4.751GiB) Error Correction support No Max memory allocation 1073741824 (1024MiB) Unified memory for Host and Device Yes Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing Yes Fine-grained system sharing No Atomics Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Preferred alignment for atomics SVM 64 bytes Global 64 bytes Local 64 bytes Atomic memory capabilities relaxed, acquire/release, sequentially-consistent, work-group scope, device scope, all-devices scope Atomic fence capabilities relaxed, acquire/release, sequentially-consistent, work-item scope, work-group scope, device scope, all-devices scope Max size for global variable 65536 (64KiB) Preferred total size of global vars 1073741824 (1024MiB) Global Memory cache type Read/Write Global Memory cache size 524288 (512KiB) Global Memory cache line size 64 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 67108864 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 4 bytes Pitch alignment for 2D image buffers 4 pixels Max 2D image size 16384x16384 pixels Max planar YUV image size 16384x16352 pixels Max 3D image size 16384x16384x2048 pixels Max number of read image args 128 Max number of write image args 128 Max number of read/write image args 128 Pipe support Yes Max number of pipe args 16 Max active pipe reservations 1 Max pipe packet size 1024 Local memory type Local Local memory size 65536 (64KiB) Max number of constant args 8 Max constant buffer size 1073741824 (1024MiB) Generic address space support Yes Max size of kernel argument 2048 (2KiB) Queue properties (on host) Out-of-order execution Yes Profiling Yes Device enqueue capabilities supported, replaceable default queue Queue properties (on device) Out-of-order execution Yes Profiling Yes Preferred size 131072 (128KiB) Max size 67108864 (64MiB) Max queues on device 1 Max events on device 1024 Prefer user sync for interop Yes Profiling timer resolution 83ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Non-uniform work-groups Yes Work-group collective functions Yes Sub-group independent forward progress Yes IL version SPIR-V_1.2 ILs with version SPIR-V 0x402000 (1.2.0) SPIR versions 1.2 printf() buffer size 4194304 (4MiB) Built-in kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel; Built-in kernels with version block_motion_estimate_intel 0x400000 (1.0.0) block_advanced_motion_estimate_check_intel 0x400000 (1.0.0) block_advanced_motion_estimate_bidirectional_check_intel 0x400000 (1.0.0) Motion Estimation accelerator version (Intel) 2 Device-side AVC Motion Estimation version 1 Supports texture sampler use Yes Supports preemption No Device Extensions cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_device_attribute_query cl_khr_fp64 cl_khr_subgroups cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_intel_va_api_media_sharing cl_intel_sharing_format_query cl_khr_pci_bus_info Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_icd 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_intel_command_queue_families 0x400000 (1.0.0) cl_intel_subgroups 0x400000 (1.0.0) cl_intel_required_subgroup_size 0x400000 (1.0.0) cl_intel_subgroups_short 0x400000 (1.0.0) cl_khr_spir 0x400000 (1.0.0) cl_intel_accelerator 0x400000 (1.0.0) cl_intel_driver_diagnostics 0x400000 (1.0.0) cl_khr_priority_hints 0x400000 (1.0.0) cl_khr_throttle_hints 0x400000 (1.0.0) cl_khr_create_command_queue 0x400000 (1.0.0) cl_intel_subgroups_char 0x400000 (1.0.0) cl_intel_subgroups_long 0x400000 (1.0.0) cl_khr_il_program 0x400000 (1.0.0) cl_intel_mem_force_host_memory 0x400000 (1.0.0) cl_khr_subgroup_extended_types 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_vote 0x400000 (1.0.0) cl_khr_subgroup_ballot 0x400000 (1.0.0) cl_khr_subgroup_non_uniform_arithmetic 0x400000 (1.0.0) cl_khr_subgroup_shuffle 0x400000 (1.0.0) cl_khr_subgroup_shuffle_relative 0x400000 (1.0.0) cl_khr_subgroup_clustered_reduce 0x400000 (1.0.0) cl_intel_device_attribute_query 0x400000 (1.0.0) cl_khr_fp64 0x400000 (1.0.0) cl_khr_subgroups 0x400000 (1.0.0) cl_intel_spirv_device_side_avc_motion_estimation 0x400000 (1.0.0) cl_intel_spirv_media_block_io 0x400000 (1.0.0) cl_intel_spirv_subgroups 0x400000 (1.0.0) cl_khr_spirv_no_integer_wrap_decoration 0x400000 (1.0.0) cl_intel_unified_shared_memory_preview 0x400000 (1.0.0) cl_khr_mipmap_image 0x400000 (1.0.0) cl_khr_mipmap_image_writes 0x400000 (1.0.0) cl_intel_planar_yuv 0x400000 (1.0.0) cl_intel_packed_yuv 0x400000 (1.0.0) cl_intel_motion_estimation 0x400000 (1.0.0) cl_intel_device_side_avc_motion_estimation 0x400000 (1.0.0) cl_intel_advanced_motion_estimation 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_khr_image2d_from_buffer 0x400000 (1.0.0) cl_khr_depth_images 0x400000 (1.0.0) cl_khr_3d_image_writes 0x400000 (1.0.0) cl_intel_media_block_io 0x400000 (1.0.0) cl_intel_va_api_media_sharing 0x400000 (1.0.0) cl_intel_sharing_format_query 0x400000 (1.0.0) cl_khr_pci_bus_info 0x400000 (1.0.0) clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel(R) OpenCL HD Graphics clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [INTEL] clCreateContext(NULL, ...) [default] Success [INTEL] clCreateContext(NULL, ...) [other] Success [POCL] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Intel(R) OpenCL HD Graphics Device Name Intel(R) Graphics [0x5917] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Intel(R) OpenCL HD Graphics Device Name Intel(R) Graphics [0x5917] clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Intel(R) OpenCL HD Graphics Device Name Intel(R) Graphics [0x5917] ICD loader properties ICD loader Name OpenCL ICD Loader ICD loader Vendor OCL Icd free software ICD loader Version 2.3.1 ICD loader Profile OpenCL 3.0
- CLINFO is partially truncated to only show Intel HD Platform part
user@WSL2:~$ sudo hashcat -I hashcat (v6.2.5) starting in backend information mode clGetDeviceIDs(): CL_DEVICE_NOT_FOUND clGetDeviceIDs(): CL_DEVICE_NOT_FOUND OpenCL Info: ============ OpenCL Platform ID #1 Vendor..: Intel(R) Corporation Name....: Intel(R) OpenCL HD Graphics Version.: OpenCL 3.0 Backend Device ID #1 Type...........: GPU Vendor.ID......: 8 Vendor.........: Intel(R) Corporation Name...........: Intel(R) Graphics [0x5917] Version........: OpenCL 3.0 NEO Processor(s)...: 24 Clock..........: 1150 Memory.Total...: 4864 MB (limited to 512 MB allocatable in one block) Memory.Free....: 2400 MB OpenCL.Version.: OpenCL C 3.0 Driver.Version.: 21.35.20826 OpenCL Platform ID #2 Vendor..: The pocl project Name....: Portable Computing Language Version.: OpenCL 2.0 pocl 1.8 Linux, None+Asserts, RELOC, LLVM 11.1.0, SLEEF, DISTRO, POCL_DEBUG Backend Device ID #2 Type...........: CPU Vendor.ID......: 128 Vendor.........: GenuineIntel Name...........: pthread-Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz Version........: OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-skylake Processor(s)...: 8 Clock..........: 2111 Memory.Total...: 4399 MB (limited to 1024 MB allocatable in one block) Memory.Free....: 2167 MB OpenCL.Version.: OpenCL C 1.2 pocl Driver.Version.: 1.8 OpenCL Platform ID #3 Vendor..: Mesa Name....: Clover Version.: OpenCL 1.1 Mesa 22.2.5
Is not a myth. Hope this clear things up for everyone.
edmondium@LAPTOP-1Q9H40K6:~$ clinfo Abort was called at 54 line in file: ./shared/source/os_interface/windows/wddm/create_um_km_data_translator.cpp Aborted
Looks like an Intel cpu/GPU again ^^^^, so same status
Has anyone gotten OpenCL working with AMD CPUs (e.g. 2700x, 5600x, 5800x)
? At a bare minimum i could do some dev work if that works. Production is a Linux machine running Linux Docker instances, that forwards Nvidia GPU's perfectly fine. I read a few things saying you could "just install the Intel CPU OpenCL driver", and i installed that but still get 0 platforms in clinfo
.
Edit: If you use a recent enough version of Ubuntu (i used 24.04, which is bleeding edge), you can just apt install pocl-opencl-icd
I was using miniconda3, so i manually built my own image, basing it on ubuntu:24.04 and copying the miniconda3 docker commands exactly, then adding apt install pocl-opencl-icd
at the end. This successfully showed my 5600x as an opencl device in clinfo
. This does not work for me in 22.04 due to pocl being too old, and it requiring way too many dependencies to recompile it. So you could probably get it working in 22.04 too with enough effort.
(base) root@cfbb31c89f97:/# clinfo
Number of platforms 1
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 3.0 PoCL 4.0+debian Linux, None+Asserts, RELOC, SPIR, LLVM 15.0.7, SLEEF, DISTRO, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_pocl_content_size
Platform Extensions with Version cl_khr_icd 0x400000 (1.0.0)
cl_pocl_content_size 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix POCL
Platform Host timer resolution 0ns
Platform Name Portable Computing Language
Number of devices 1
Device Name cpu-haswell-AMD Ryzen 5 5600X 6-Core Processor
Note that this is still not a solution for nvidia/amd GPU opencl passthrough
, but it's good enough for my development needs.
@alex-ong I have it working in WSL2 Ubuntu 22.04 without pocl on a laptop with a 5800HS, but for the intel platforms to be detected I have to source the setvars.sh script from the Oneapi installation. source /opt/intel/oneapi/setvars.sh
I installed it a long time ago so I don't recall the details of how I installed it. But I don't remember having much trouble with it.
I was able to run OpenCL on NVIDIA on WSL2 via PoCL
There is "NVIDIA GeForce RTX 3060 Ti" device in clinfo
output (listing below) and working OpenCL apps
Windows task manager also shows GPU Cuda utilization when CL programs run
(can say nothing about perfomance but got some benchmark below)
I took the following steps:
1.1. Now you can run nvidia-smi
in WSL to ensure it works
$ nvidia-smi
listing:
Mon Jan 29 04:05:38 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.40.06 Driver Version: 551.23 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Ti On | 00000000:01:00.0 On | N/A |
| 0% 37C P8 11W / 225W | 607MiB / 8192MiB | 7% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 20 G /Xwayland N/A |
| 0 N/A N/A 42 G /code N/A |
| 0 N/A N/A 79 G /code N/A |
+-----------------------------------------------------------------------------------------+
DO NOT install any gpu/cuda drivers into WSL
Install Cuda-Toolkit "WSL-Ubuntu" version from here (nvidia page) (I have Debian on my WSL, but there wasnt any problems to install) (except shocking that i need to download ~9GB of packages...) Again, ensure that you don't installing linux drivers ("WSL-Ubuntu" supposed to not contain them)
AFAIK you need to have llvm/clang installed in order to compile kernels via PoCL
so $ sudo apt install llvm clang
i think
(There also possibility of "LLVM-less buid", but dont mind)
Install some packages required to build PoCL
(i almost sure that forgot something)
$ sudo apt install ...
libclang-dev
(maybe also libclang-{version}-dev
)
libclang-common-{version}-dev
libclang-cpp
(maybe also libclang-cpp{version})
libclang-cpp-dev
(libclang-cpp{ver}-dev
)
ocl-icd-libopencl1
(maybe also ocl-icd-opencl-dev
) - icd loader
opencl-headers
(opencl-c-headers
opencl-clhpp-headers
)
valgrind
(because some cuda-related PoCL sources requires it)
Download and build Pocl (GitHub) I was build with this variables (from pocl directory):
$ cmake -B {your-build-dir} \
-DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \ # for ld to find libcuda.so
-DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \ # i don't know which of these two is neccessary, but it works
-DENABLE_HOST_CPU_DEVICES=OFF \ # you can leave this 'ON' if you want also have your CPU as OpenCL device
-DENABLE_CUDA=ON \ # no comments
Then run $ cmake --build {your-build-dir} -j{num of threads}
and pray and maybe fix problems that arise
On successful build you can try if it works without installing
$ export POCL_BUILDING=1
- says to pocl that it will able to work from building directory
$ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/
- says to ocl-icd-loader where to find pocl
Viola!
Now you can run 'clinfo' and other OpenCL apps
Also $ cmake --install {your-build-dir}
to istall in system if you need (i dont so not testing)
My clinfo
listing:
Number of platforms 1
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 3.0 PoCL 5.1-pre main-0-g8053faf0 Linux, Debug+Asserts, RELOC, SPIR, LLVM 15.0.6, SLEEF, CUDA, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_pocl_content_size
Platform Extensions with Version cl_khr_icd 0x400000 (1.0.0)
cl_pocl_content_size 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix POCL
Platform Host timer resolution 0ns
Platform Name Portable Computing Language
Number of devices 1
Device Name NVIDIA GeForce RTX 3060 Ti
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 3.0 PoCL HSTR: CUDA-sm_86
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 5.1-pre main-0-g8053faf0
Device OpenCL C Version OpenCL C 1.2 PoCL
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_images 0xc00000 (3.0.0)
__opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0)
__opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0)
__opencl_c_atomic_scope_device 0xc00000 (3.0.0)
__opencl_c_program_scope_global_variables 0xc00000 (3.0.0)
__opencl_c_generic_address_space 0xc00000 (3.0.0)
__opencl_c_fp16 0xc00000 (3.0.0)
__opencl_c_fp64 0xc00000 (3.0.0)
Latest conformance test passed (n/a)
Device Type GPU
Device Topology (NV) PCI-E, 0000:01:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 38
Max clock frequency 1695MHz
Compute Capability (NV) 8.6
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple (device) 32
Preferred work group size multiple (kernel) 32
Warp size (NV) 32
Max sub-groups per work group 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 8589279232 (7.999GiB)
Error Correction support No
Max memory allocation 2147319808 (2GiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 64 bytes
Atomic memory capabilities relaxed, work-group scope
Atomic fence capabilities relaxed, acquire/release, work-group scope
Max size for global variable 0
Preferred total size of global vars 0
Global Memory cache type None
Image support No
Pipe support No
Max number of pipe args 0
Max active pipe reservations 0
Max pipe packet size 0
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max number of constant args 8
Max constant buffer size 65536 (64KiB)
Generic address space support Yes
Max size of kernel argument 4352 (4.25KiB)
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Device enqueue capabilities (n/a)
Queue properties (on device)
Out-of-order execution No
Profiling No
Preferred size 0
Max size 0
Max queues on device 0
Max events on device 0
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Non-uniform work-groups No
Work-group collective functions No
Sub-group independent forward progress Yes
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 5
IL version (n/a)
ILs with version (n/a)
SPIR versions (n/a)
printf() buffer size 16777216 (16MiB)
Built-in kernels pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32
Built-in kernels with version pocl.mul.i32 0x402000 (1.2.0)
pocl.add.i32 0x402000 (1.2.0)
pocl.dnn.conv2d_int8_relu 0x402000 (1.2.0)
pocl.sgemm.local.f32 0x402000 (1.2.0)
pocl.sgemm.tensor.f16f16f32 0x402000 (1.2.0)
pocl.sgemm_ab.tensor.f16f16f32 0x402000 (1.2.0)
pocl.abs.f32 0x402000 (1.2.0)
pocl.add.i8 0x402000 (1.2.0)
org.khronos.openvx.scale_image.nn.u8 0x402000 (1.2.0)
org.khronos.openvx.scale_image.bl.u8 0x402000 (1.2.0)
org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 0x402000 (1.2.0)
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64
Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0)
cl_khr_global_int32_base_atomics 0x400000 (1.0.0)
cl_khr_global_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_local_int32_base_atomics 0x400000 (1.0.0)
cl_khr_local_int32_extended_atomics 0x400000 (1.0.0)
cl_khr_int64_base_atomics 0x400000 (1.0.0)
cl_khr_int64_extended_atomics 0x400000 (1.0.0)
cl_nv_device_attribute_query 0x400000 (1.0.0)
cl_khr_spir 0x801000 (2.1.0)
cl_khr_fp16 0x400000 (1.0.0)
cl_khr_fp64 0x400000 (1.0.0)
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Portable Computing Language
Device Name NVIDIA GeForce RTX 3060 Ti
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Portable Computing Language
Device Name NVIDIA GeForce RTX 3060 Ti
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Portable Computing Language
Device Name NVIDIA GeForce RTX 3060 Ti
And some benchmark (i dont know what these numbers means, good or bad)
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \_.-" | | "-._/ / |
| \ _.-" _ "-._ / |
| \.-" _.-" "-._ "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / FluidX3D Version 2.12 |
| ' Copyright (c) Dr. Moritz Lehmann |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 3060 Ti |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 3060 Ti |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 5.1-pre main-0-g8053faf0 (Linux) |
| OpenCL Version | OpenCL C 1.2 PoCL |
| Compute Units | 38 at 1695 MHz (4864 cores, 16.489 TFLOPs/s) |
| Memory, Cache | 8191 MB, 0 KB global / 48 KB local |
| Buffer Limits | 2047 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| Info: Allocating memory. This may take a few seconds. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| Grid Domains | 1 x 1 x 1 = 1 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3307 | 506 GB/s | 197 | 9990 0% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3466 |
@Bossach Now this looks promising! PoCL too seems to have come a long way since I last checked. I might give it a try at some point.
I was able to run OpenCL on NVIDIA on WSL2 via PoCL There is "NVIDIA GeForce RTX 3060 Ti" device in
clinfo
output (listing below) and working OpenCL apps Windows task manager also shows GPU Cuda utilization when CL programs run (can say nothing about perfomance but got some benchmark below)I took the following steps:
- instal the latest Windows Nvidia drivers (idk since which version, but new ones can do some clever thing to expose GPU inside WSL)
1.1. Now you can run
nvidia-smi
in WSL to ensure it works$ nvidia-smi
listing:Mon Jan 29 04:05:38 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.40.06 Driver Version: 551.23 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3060 Ti On | 00000000:01:00.0 On | N/A | | 0% 37C P8 11W / 225W | 607MiB / 8192MiB | 7% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 20 G /Xwayland N/A | | 0 N/A N/A 42 G /code N/A | | 0 N/A N/A 79 G /code N/A | +-----------------------------------------------------------------------------------------+
- DO NOT install any gpu/cuda drivers into WSL
- Install Cuda-Toolkit "WSL-Ubuntu" version from here (nvidia page) (I have Debian on my WSL, but there wasnt any problems to install) (except shocking that i need to download ~9GB of packages...) Again, ensure that you don't installing linux drivers ("WSL-Ubuntu" supposed to not contain them)
- AFAIK you need to have llvm/clang installed in order to compile kernels via PoCL so
$ sudo apt install llvm clang
i think (There also possibility of "LLVM-less buid", but dont mind)- Install some packages required to build PoCL (i almost sure that forgot something)
$ sudo apt install ...
libclang-dev
(maybe alsolibclang-{version}-dev
)libclang-common-{version}-dev
libclang-cpp
(maybe also libclang-cpp{version})libclang-cpp-dev
(libclang-cpp{ver}-dev
)ocl-icd-libopencl1
(maybe alsoocl-icd-opencl-dev
) - icd loaderopencl-headers
(opencl-c-headers
opencl-clhpp-headers
)valgrind
(because some cuda-related PoCL sources requires it)- Download and build Pocl (GitHub) I was build with this variables (from pocl directory):
$ cmake -B {your-build-dir} \ -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \ # for ld to find libcuda.so -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \ # i don't know which of these two is neccessary, but it works -DENABLE_HOST_CPU_DEVICES=OFF \ # you can leave this 'ON' if you want also have your CPU as OpenCL device -DENABLE_CUDA=ON \ # no comments
Then run
$ cmake --build {your-build-dir} -j{num of threads}
and pray and maybe fix problems that ariseOn successful build you can try if it works without installing
$ export POCL_BUILDING=1
- says to pocl that it will able to work from building directory$ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/
- says to ocl-icd-loader where to find pocl Viola! Now you can run 'clinfo' and other OpenCL appsAlso
$ cmake --install {your-build-dir}
to istall in system if you need (i dont so not testing)My
clinfo
listing:Number of platforms 1 Platform Name Portable Computing Language Platform Vendor The pocl project Platform Version OpenCL 3.0 PoCL 5.1-pre main-0-g8053faf0 Linux, Debug+Asserts, RELOC, SPIR, LLVM 15.0.6, SLEEF, CUDA, POCL_DEBUG Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_pocl_content_size Platform Extensions with Version cl_khr_icd 0x400000 (1.0.0) cl_pocl_content_size 0x400000 (1.0.0) Platform Numeric Version 0xc00000 (3.0.0) Platform Extensions function suffix POCL Platform Host timer resolution 0ns Platform Name Portable Computing Language Number of devices 1 Device Name NVIDIA GeForce RTX 3060 Ti Device Vendor NVIDIA Corporation Device Vendor ID 0x10de Device Version OpenCL 3.0 PoCL HSTR: CUDA-sm_86 Device Numeric Version 0xc00000 (3.0.0) Driver Version 5.1-pre main-0-g8053faf0 Device OpenCL C Version OpenCL C 1.2 PoCL Device OpenCL C all versions OpenCL C 0x400000 (1.0.0) OpenCL C 0x401000 (1.1.0) OpenCL C 0x402000 (1.2.0) OpenCL C 0xc00000 (3.0.0) Device OpenCL C features __opencl_c_images 0xc00000 (3.0.0) __opencl_c_atomic_order_acq_rel 0xc00000 (3.0.0) __opencl_c_atomic_order_seq_cst 0xc00000 (3.0.0) __opencl_c_atomic_scope_device 0xc00000 (3.0.0) __opencl_c_program_scope_global_variables 0xc00000 (3.0.0) __opencl_c_generic_address_space 0xc00000 (3.0.0) __opencl_c_fp16 0xc00000 (3.0.0) __opencl_c_fp64 0xc00000 (3.0.0) Latest conformance test passed (n/a) Device Type GPU Device Topology (NV) PCI-E, 0000:01:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 38 Max clock frequency 1695MHz Compute Capability (NV) 8.6 Device Partition (core) Max number of sub-devices 1 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x64 Max work group size 1024 Preferred work group size multiple (device) 32 Preferred work group size multiple (kernel) 32 Warp size (NV) 32 Max sub-groups per work group 32 Preferred / native vector sizes char 1 / 1 short 1 / 1 int 1 / 1 long 1 / 1 half 0 / 0 (cl_khr_fp16) float 1 / 1 double 1 / 1 (cl_khr_fp64) Half-precision Floating-point support (cl_khr_fp16) Denormals No Infinity and NANs No Round to nearest No Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Address bits 64, Little-Endian Global memory size 8589279232 (7.999GiB) Error Correction support No Max memory allocation 2147319808 (2GiB) Unified memory for Host and Device No Integrated memory (NV) No Shared Virtual Memory (SVM) capabilities (core) Coarse-grained buffer sharing Yes Fine-grained buffer sharing Yes Fine-grained system sharing No Atomics No Minimum alignment for any data type 128 bytes Alignment of base address 4096 bits (512 bytes) Preferred alignment for atomics SVM 64 bytes Global 64 bytes Local 64 bytes Atomic memory capabilities relaxed, work-group scope Atomic fence capabilities relaxed, acquire/release, work-group scope Max size for global variable 0 Preferred total size of global vars 0 Global Memory cache type None Image support No Pipe support No Max number of pipe args 0 Max active pipe reservations 0 Max pipe packet size 0 Local memory type Local Local memory size 49152 (48KiB) Registers per block (NV) 65536 Max number of constant args 8 Max constant buffer size 65536 (64KiB) Generic address space support Yes Max size of kernel argument 4352 (4.25KiB) Queue properties (on host) Out-of-order execution No Profiling Yes Device enqueue capabilities (n/a) Queue properties (on device) Out-of-order execution No Profiling No Preferred size 0 Max size 0 Max queues on device 0 Max events on device 0 Prefer user sync for interop Yes Profiling timer resolution 1ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Non-uniform work-groups No Work-group collective functions No Sub-group independent forward progress Yes Kernel execution timeout (NV) Yes Concurrent copy and kernel execution (NV) Yes Number of async copy engines 5 IL version (n/a) ILs with version (n/a) SPIR versions (n/a) printf() buffer size 16777216 (16MiB) Built-in kernels pocl.mul.i32;pocl.add.i32;pocl.dnn.conv2d_int8_relu;pocl.sgemm.local.f32;pocl.sgemm.tensor.f16f16f32;pocl.sgemm_ab.tensor.f16f16f32;pocl.abs.f32;pocl.add.i8;org.khronos.openvx.scale_image.nn.u8;org.khronos.openvx.scale_image.bl.u8;org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 Built-in kernels with version pocl.mul.i32 0x402000 (1.2.0) pocl.add.i32 0x402000 (1.2.0) pocl.dnn.conv2d_int8_relu 0x402000 (1.2.0) pocl.sgemm.local.f32 0x402000 (1.2.0) pocl.sgemm.tensor.f16f16f32 0x402000 (1.2.0) pocl.sgemm_ab.tensor.f16f16f32 0x402000 (1.2.0) pocl.abs.f32 0x402000 (1.2.0) pocl.add.i8 0x402000 (1.2.0) org.khronos.openvx.scale_image.nn.u8 0x402000 (1.2.0) org.khronos.openvx.scale_image.bl.u8 0x402000 (1.2.0) org.khronos.openvx.tensor_convert_depth.wrap.u8.f32 0x402000 (1.2.0) Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_device_attribute_query cl_khr_spir cl_khr_fp16 cl_khr_fp64 Device Extensions with Version cl_khr_byte_addressable_store 0x400000 (1.0.0) cl_khr_global_int32_base_atomics 0x400000 (1.0.0) cl_khr_global_int32_extended_atomics 0x400000 (1.0.0) cl_khr_local_int32_base_atomics 0x400000 (1.0.0) cl_khr_local_int32_extended_atomics 0x400000 (1.0.0) cl_khr_int64_base_atomics 0x400000 (1.0.0) cl_khr_int64_extended_atomics 0x400000 (1.0.0) cl_nv_device_attribute_query 0x400000 (1.0.0) cl_khr_spir 0x801000 (2.1.0) cl_khr_fp16 0x400000 (1.0.0) cl_khr_fp64 0x400000 (1.0.0) NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [POCL] clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1) Platform Name Portable Computing Language Device Name NVIDIA GeForce RTX 3060 Ti clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Portable Computing Language Device Name NVIDIA GeForce RTX 3060 Ti clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1) Platform Name Portable Computing Language Device Name NVIDIA GeForce RTX 3060 Ti
And some benchmark (i dont know what these numbers means, good or bad)
.-----------------------------------------------------------------------------. | ______________ ______________ | | \ ________ | | ________ / | | \ \ | | | | / / | | \ \ | | | | / / | | \ \ | | | | / / | | \ \_.-" | | "-._/ / | | \ _.-" _ "-._ / | | \.-" _.-" "-._ "-./ | | .-" .-"-. "-. | | \ v" "v / | | \ \ / / | | \ \ / / | | \ \ / / | | \ ' / | | \ / | | \ / FluidX3D Version 2.12 | | ' Copyright (c) Dr. Moritz Lehmann | |-----------------------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID 0 | NVIDIA GeForce RTX 3060 Ti | |----------------'------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | NVIDIA GeForce RTX 3060 Ti | | Device Vendor | NVIDIA Corporation | | Device Driver | 5.1-pre main-0-g8053faf0 (Linux) | | OpenCL Version | OpenCL C 1.2 PoCL | | Compute Units | 38 at 1695 MHz (4864 cores, 16.489 TFLOPs/s) | | Memory, Cache | 8191 MB, 0 KB global / 48 KB local | | Buffer Limits | 2047 MB global, 64 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | | Info: Allocating memory. This may take a few seconds. | |-----------------.-----------------------------------------------------------| | Grid Resolution | 256 x 256 x 256 = 16777216 | | Grid Domains | 1 x 1 x 1 = 1 | | LBM Type | D3Q19 SRT (FP32/FP32) | | Memory Usage | CPU 272 MB, GPU 1x 1488 MB | | Max Alloc Size | 1216 MB | | Time Steps | 10 | | Kin. Viscosity | 1.00000000 | | Relaxation Time | 3.50000000 | | Reynolds Number | Re < 148 | |---------.-------'-----.-----------.-------------------.---------------------| | MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining | | 3307 | 506 GB/s | 197 | 9990 0% | 0s | |---------'-------------'-----------'-------------------'---------------------| | Info: Peak MLUPs/s = 3466 |
This worked for me. I installed it but the arguments aren't passed by default if I do clinfo it works with the $ export POCL_BUILDING=1 $ export OCL_ICD_VENDORS={full-path-to-your-build-dir}/ocl-vendors/ - But it doesn't "stick" should I put this into my bash.rc or rc.local or something like that or there's a cleaner way?
@joaomamede
The cleaner way is
$ sudo cmake --install {your-build-dir}
It should install pocl and icd in system and it should just work
if not, first i would chek is $ ls /etc/OpenCL/vendors
contains pocl.icd
and $ cat pocl.icd
contains valid path to /.../libpocl.so...
and libpocl
indeed exists there. If not, then something is wrong with installation
Alternatively, yo can put export
s in your bash.rc and it should work for all apps you launch from bash under your user. (until you accidentally remove pocl build directory cause it works from there)
Windows Build Number
21382.1
WSL Version
Kernel Version
5.10.16.3
Distro Version
Ubuntu 20.04
Other Software
Inside WSL: clinfo (for checking OpenCL platforms) CUDA 11.3 (docker container runs with NVIDIA_DISABLE_REQUIRE=1, as it otherwise thinks it's running 11.0) Docker 20.10.6, build 370c289 (with custom container) nvidia-docker2 2.5.0-1
On Windows: NVIDIA Graphics Driver for CUDA on WSL 470.14
Repro Steps
I installed the Nvidia drivers and docker as according to Nvidia's user guide I am however running an older version of nvidia-docker2 (and dependencies) as according to a forum post here
Additionally, I have also installed the CUDA on WSL driver here
Steps: Run clinfo (both in and outside of the Docker container)
Expected Behavior
clinfo should return the graphics card (in my case, GTX 970) as an OpenCL platform
Actual Behavior
clinfo reports 0 platforms available, both inside the container and just on WSL
Diagnostic Logs
cuda nvidia-container-cli glxinfo (from inside of container) glxinfo (from WSL, outside of container) wsl.etl