Open fanoush opened 2 months ago
Normal Ubuntu 22.04 and 24.04 have also different kernel version, but I do not know whether it's same with WSL Ubuntu installations. If kernel versions do differ, it would be interesting to know whether kernel or user-space has more impact on performance, so this performance regression can be filed against correct project.
Could you try 22.04 kernel on 24.04 or vice verse?
If not, what about testing 22.04 compute driver version on 24.04, or vice verse?
There is same one kernel provided by microsoft
$ cat /proc/version
Linux version 5.15.146.1-microsoft-standard-WSL2 (root@65c757a075e2) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Jan 11 04:09:03 UTC 2024
All those three installations are running on same computer so all are using this same microsoft kernel, same WSL and WSLg version and same windows intel driver. I also tried to only start one of them for running test just to be sure.
BTW, the kernel source is at https://github.com/microsoft/WSL2-Linux-Kernel and one can build custom one from source but this is the default one provided by microsoft as part of WSL. I think you cannot run two kernels at once for different WSL instances. Also you cannot run real Ubuntu kernel since this one has special WSL drivers.
If not, what about testing 22.04 compute driver version on 24.04, or vice verse?
How would I do that?
$ dpkg -L intel-opencl-icd
/.
/etc
/etc/OpenCL
/etc/OpenCL/vendors
/etc/OpenCL/vendors/intel.icd
/usr
/usr/bin
/usr/bin/ocloc
/usr/include
/usr/include/ocloc_api.h
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/intel-opencl
/usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
/usr/lib/x86_64-linux-gnu/libocloc.so
/usr/share
/usr/share/doc
/usr/share/doc/intel-opencl-icd
/usr/share/doc/intel-opencl-icd/changelog.Debian.gz
/usr/share/doc/intel-opencl-icd/copyright
you mean copying the /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so /usr/lib/x86_64-linux-gnu/libocloc.so files from older 22.14.22890-1 ubuntu 22.04 package into 24.04 and/or debian? I can try that.
Anybody else seeing this?
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | Intel(R) Arc(TM) A750 Graphics |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 24.09.28717.17 (Linux) |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 448 at 2400 MHz (3584 cores, 17.203 TFLOPs/s) |
| Memory, Cache | 8127 MB, 16384 KB global / 64 KB local |
| Buffer Limits | 3860 MB global, 3953458 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute not supported |
<--- hangs here
$ uname --kernel-release
6.8.8-200.fc39.x86_64
Different issue? Looks like I have different driver.
If not, what about testing 22.04 compute driver version on 24.04, or vice verse?
OK, the result is interesting. Older version copied to newer distro becomes slower too in exactly the same way.
bookworm:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | Intel(R) Graphics [0x46a6] |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) Graphics [0x46a6] |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 1.0.0 (Linux) |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s) |
| Memory, Cache | 26082 MB, 1024 KB global / 64 KB local |
| Buffer Limits | 1024 MB global, 1048576 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute not supported |
| FP32 compute 1.881 TFLOPs/s ( 1x ) |
| FP16 compute 3.478 TFLOPs/s ( 2x ) |
| INT64 compute 0.148 TIOPs/s (1/16) |
| INT32 compute 0.681 TIOPs/s (1/3 ) |
| INT16 compute 7.286 TIOPs/s ( 4x ) |
| INT8 compute 1.375 TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read ) 66.43 GB/s |
| Memory Bandwidth ( coalesced write) 61.57 GB/s |
| Memory Bandwidth (misaligned read ) 66.10 GB/s |
| Memory Bandwidth (misaligned write) 32.45 GB/s |
| PCIe Bandwidth (send ) 21.57 GB/s |
| PCIe Bandwidth ( receive ) 21.65 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 11.93 GB/s |
|-----------------------------------------------------------------------------|
Newer driver does not run in older ubuntu so cannot test the other way
22.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | Intel(R) Graphics [0x46a6] |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) Graphics [0x46a6] |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 22.43.24595 (Linux) |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s) |
| Memory, Cache | 26082 MB, 3840 KB global / 64 KB local |
| Buffer Limits | 1024 MB global, 1048576 KB constant |
|----------------'------------------------------------------------------------|
| Warning: |
| Error: OpenCL C code compilation failed with error code -6. Make sure there |
| are no errors in kernel.cpp. |
'-----------------------------------------------------------------------------'
So can it be related to system libraries like libc or even C compiler? How is the OpenCL C code compiled for the gpu?
All those three installations are running on same computer so all are using this same microsoft kernel, same WSL and WSLg version and same windows intel driver. I also tried to only start one of them for running test just to be sure.
24.04 should be using 6.8 kernel, not the 5.15 one in 22.04... Are you sure 22.04 and 24.04 Ubuntu WSL versions are really using same kernel version?
Note: In normal Ubuntu LTS installs, one can install newer, so called "HW Enabling" kernels, few months after they've first been tested in Ubuntu devel versions. Latest HWE kernel available for 22.04 is 6.5: https://packages.ubuntu.com/jammy/linux-generic-hwe-22.04
Older version copied to newer distro becomes slower too in exactly the same way.
Did you copy Just the older version of compute runtime (intel-opencl-icd
), or also rest of the compute stack [1]?
(IGC, LLVM and kernel are most likely components in the stack to affect these numbers.)
How is the OpenCL C code compiled for the gpu?
Using IGC: https://github.com/intel/intel-graphics-compiler/
Which uses LLVM, opencl-clang and SPIRV-translator.
AFAIK IGC packages in distros (like Ubuntu) use distro-specific versions of those dependencies, linked dynamically, whereas IGC packages from Intel package repos, and releases here, include statically linked LLVM version.
[1] I assume you're using distro versions of everything. Ubuntu 22.04 => 24.04 upgrade implies following version changes:
intel-opencl-icd
:
libigc
/ libigdfcl1
:
libigdgmm12
:
See: https://packages.ubuntu.com/noble/intel-opencl-icd (and what it links).
PS. You could add to title something like "6-7% perf regression in OpenCL-Benchmark".
24.04 should be using 6.8 kernel, not the 5.15 one in 22.04... Are you sure 22.04 and 24.04 Ubuntu WSL versions are really using same kernel version?
Not sure we are talking about same thing, WSL = Windows Subsystem for Linux. As mentioned previously there is only one kernel
Microsoft Windows [Version 10.0.22631.3447]
(c) Microsoft Corporation. All rights reserved.
C:\>wsl -v
WSL version: 2.1.5.0
Kernel version: 5.15.146.1-2
WSLg version: 1.0.60
MSRDC version: 1.2.5105
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3447
C:\>wsl --update
Checking for updates.
The most recent version of Windows Subsystem for Linux is already installed.
and the Windows driver is
Did you copy Just the older version of compute runtime (intel-opencl-icd),
Yes, just libraries listed in dpkg -L intel-opencl-icd
, did not know about the rest.
Also is that benchmark good reference or is there better way to check if this regression is real?
And BTW I just followed the readme here https://github.com/intel/compute-runtime?tab=readme-ov-file#via-system-package-manager so I installed packages from ubuntu/debian repo. I did not follow https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html (which is maybe outdated?) to add intel repos.
Also is that benchmark good reference or is there better way to check if this regression is real?
Unfortunately I have no idea.
(While I work for Intel and know a bit about the Linux drivers, I'm not a driver developer, or otherwise related to this project. I'm just another user of this driver, mainly for its Level-Zero Sysman API, not its OpenCL API.)
And BTW I just followed the readme here https://github.com/intel/compute-runtime?tab=readme-ov-file#via-system-package-manager so I installed packages from ubuntu/debian repo. I did not follow https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-0/configure-wsl-2-for-gpu-workflows.html
I think WSL docs recommend using the driver versions from Intel repos because distro driver versions are compiled only with support for upstream Linux kernel, which is missing some things that are in the Intel out-of-tree kernel driver, and I assume in WSL / Windows kernel drivers: https://dgpu-docs.intel.com/driver/kernel-driver-types.html#differences-between-the-out-of-tree-driver-and-the-upstream-kernel
(which is maybe outdated?) to add intel repos.
While those repo names should AFAIK still work, the recommended repo names have changed a bit since then. Latest Intel driver repo info is here: https://dgpu-docs.intel.com/driver/client/overview.html
(That page is for client GPUs, like iGPUs.)
Hi @fanoush
I see you have different drivers on the systems:
22.04 | Device Driver | 1.0.0 (Linux) |
24.04 | Device Driver | 23.43.027642 (Linux) |
Could you please retry using packages from our latest github release?
Could you please retry using packages from our latest github release?
Both are what comes from ubuntu repos for those versions, the one reporting 1.0.0 is actually ubuntu package with version 22.14.22890-1
I did this in bookworkm first, unistalled packages from repo sudo apt-get purge intel-opencl-icd ; sudo apt-get autoremove
Removing intel-opencl-icd (22.43.24595.41-1) ...
Removing libigdfcl1:amd64 (1.0.12504.6-1+deb12u1) ...
Removing libopencl-clang14:amd64 (14.0.0-4) ...
Removing libclang-cpp14 (1:14.0.6-12) ...
Removing libigc1:amd64 (1.0.12504.6-1+deb12u1) ...
Removing libigdgmm12:amd64 (22.3.3+ds1-1) ...
Removing libllvmspirvlib14:amd64 (14.0.0-5) ...
Removing libllvm14:amd64 (1:14.0.6-12) ...
Removing libz3-4:amd64 (4.8.12-3.1) ...
and installed https://github.com/intel/compute-runtime/releases/tag/24.13.29138.7 via wget/dpkg results are very similar i.e. slower
bookworm:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | Intel(R) Graphics [0x46a6] |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) Graphics [0x46a6] |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 24.13.29138.7 (Linux) |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s) |
| Memory, Cache | 30197 MB, 3840 KB global / 64 KB local |
| Buffer Limits | 1024 MB global, 1048576 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute not supported |
| FP32 compute 1.887 TFLOPs/s ( 1x ) |
| FP16 compute 3.480 TFLOPs/s ( 2x ) |
| INT64 compute 0.158 TIOPs/s (1/16) |
| INT32 compute 0.682 TIOPs/s (1/3 ) |
| INT16 compute 7.353 TIOPs/s ( 4x ) |
| INT8 compute 1.381 TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read ) 64.83 GB/s |
| Memory Bandwidth ( coalesced write) 57.96 GB/s |
| Memory Bandwidth (misaligned read ) 64.61 GB/s |
| Memory Bandwidth (misaligned write) 31.98 GB/s |
| PCIe Bandwidth (send ) 20.88 GB/s |
| PCIe Bandwidth ( receive ) 21.15 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 10.58 GB/s |
|-----------------------------------------------------------------------------|
Did the same in ubuntu 22.04
Removing intel-opencl-icd (22.14.22890-1) ...
Removing libigdfcl1:amd64 (1.0.10840-1) ...
Removing libopencl-clang12:amd64 (12.0.0-3) ...
Removing libclang-cpp12 (1:12.0.1-19ubuntu3) ...
Removing libigc1:amd64 (1.0.10840-1) ...
Removing libigdgmm12:amd64 (22.1.2+ds1-1) ...
Removing libllvmspirvlib12:amd64 (12.0.0-3) ...
Removing libllvm12:amd64 (1:12.0.1-19ubuntu3) ...
and the results in 22.04 is now same = slower
22.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | Intel(R) Graphics [0x46a6] |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) Graphics [0x46a6] |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 24.13.29138.7 (Linux) |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s) |
| Memory, Cache | 30197 MB, 3840 KB global / 64 KB local |
| Buffer Limits | 1024 MB global, 1048576 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute not supported |
| FP32 compute 1.884 TFLOPs/s ( 1x ) |
| FP16 compute 3.480 TFLOPs/s ( 2x ) |
| INT64 compute 0.160 TIOPs/s (1/16) |
| INT32 compute 0.682 TIOPs/s (1/3 ) |
| INT16 compute 7.205 TIOPs/s ( 4x ) |
| INT8 compute 1.379 TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read ) 66.30 GB/s |
| Memory Bandwidth ( coalesced write) 61.70 GB/s |
| Memory Bandwidth (misaligned read ) 65.16 GB/s |
| Memory Bandwidth (misaligned write) 32.56 GB/s |
| PCIe Bandwidth (send ) 21.44 GB/s |
| PCIe Bandwidth ( receive ) 21.61 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 11.83 GB/s |
|-----------------------------------------------------------------------------|
So only the old 22.14.22890-1 from ubuntu 22.04 gives better numbers.
sudo dpkg --purge intel-igc-core intel-igc-opencl intel-opencl-icd libigdgmm12 intel-level-zero-gpu
....
sudo apt-get install intel-opencl-icd
...
Unpacking intel-opencl-icd (22.14.22890-1) ...
...
Setting up libigdgmm12:amd64 (22.1.2+ds1-1) ...
Setting up libllvm12:amd64 (1:12.0.1-19ubuntu3) ...
Setting up libllvmspirvlib12:amd64 (12.0.0-3) ...
Setting up libclang-cpp12 (1:12.0.1-19ubuntu3) ...
Setting up libopencl-clang12:amd64 (12.0.0-3) ...
Setting up libigc1:amd64 (1.0.10840-1) ...
Setting up libigdfcl1:amd64 (1.0.10840-1) ...
Setting up intel-opencl-icd (22.14.22890-1) ...
...
22.04:~$ ./OpenCL-Benchmark-Linux
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | Intel(R) Graphics [0x46a6] |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) Graphics [0x46a6] |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 1.0.0 (Linux) |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 96 at 1450 MHz (768 cores, 2.227 TFLOPs/s) |
| Memory, Cache | 26082 MB, 1024 KB global / 64 KB local |
| Buffer Limits | 1024 MB global, 1048576 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64 compute not supported |
| FP32 compute 2.020 TFLOPs/s ( 1x ) |
| FP16 compute 3.699 TFLOPs/s ( 2x ) |
| INT64 compute 0.147 TIOPs/s (1/16) |
| INT32 compute 0.693 TIOPs/s (1/3 ) |
| INT16 compute 7.245 TIOPs/s ( 4x ) |
| INT8 compute 1.415 TIOPs/s (2/3 ) |
| Memory Bandwidth ( coalesced read ) 66.18 GB/s |
| Memory Bandwidth ( coalesced write) 60.98 GB/s |
| Memory Bandwidth (misaligned read ) 65.39 GB/s |
| Memory Bandwidth (misaligned write) 32.60 GB/s |
| PCIe Bandwidth (send ) 21.86 GB/s |
| PCIe Bandwidth ( receive ) 21.93 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 12.00 GB/s |
|-----------------------------------------------------------------------------|
And btw the benchmark code running the FP32 and FP16 kernels is here https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/main.cpp#L53 and the source of opencl kernels start here https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/kernel.cpp#L18
I use OpenCL-Benchmark-Linux to verify opencl is working in WSL. I just installed new Ubuntu 24.04 and it looks like it is slower than 22.04. Then I also tried in Debian 12 bookworm and it is slower too so only ubuntu 22.04 is faster. These are three WSL instances on same Windows 11 computer running same version of the benchmark binary from https://github.com/ProjectPhysX/OpenCL-Benchmark
Ubuntu 22.04
Ubuntu 24.04 (and debian bookworm)
FP32 and FP16 are faster in 22.04 release. the version is printed as
Device Driver | 1.0.0 (Linux)
while the 22.04 installed package is in factUbuntu 24.04 intel-opencl-icd package is
Version: 23.43.27642.40-1ubuntu3
and bookworm isVersion: 22.43.24595.41-1
If you need any other info (like output of clinfo) let me know. These numbers are pretty consistent across multiple runs. Is this expected or is the benchmark meaningless? Should I run some other test to verify or get more details?