microsoft / OpenCLOn12

The OpenCL-on-D3D12 mapping layer
MIT License
104 stars 13 forks source link

Geekbench v6.2.2 command fails when new laptop OS image is installed and Intel NPU Driver is enabled #61

Open leonardo-intel opened 2 months ago

leonardo-intel commented 2 months ago

I am an engineer for the Intel NPU Software driver team on Windows Hosts. I am attempting to root-cause a Geekbench v6.2.2 app issue, where the Geekbench app does not start. It loads the splash screen, then crashes, if the NPU device is enabled via Device Manager.

On new OS install, when the following Geekbench v6.2.2 command in cmd prompt is executed,

geekbench6.exe --gpu-list

it throws the following exception:

Internal error message: D:\a_work\1\s\OpenCLOn12\src\device.cpp(545)OpenCLOn12.dll!00007FFDCCC20C3F: (caller: 00007FFDCCC2148F) Exception(1) tid(21cc) 887A0004 The specified device interface or feature level is not supported on this system.

Note the file (device.cpp) and line number (545).

In the OpenCLOn12 git public repro, the above exception log does not match latest code, but if you look at past commit tag v1.2112.2.0, and browse to device.cpp, line 545, you can see that the exception happens during D3D12CreateDevice:

THROW_IF_FAILED(D3D12CreateDevice(m_spAdapter.Get(), D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&m_spDevice)));

Note the D3D_FEATURE_LEVEL_11_0 minimum feature level required.

In summary, OpenCLOn12 appears to throw an exception because the NPU driver does not meet D3D_FEATURE_LEVEL_11_0 minimum feature level. This is the expected behavior/config in the NPU driver because the NPU Driver currently supports D3D_FEATURE_LEVEL_1_0_CORE.

This is not an NPU driver issue, and OpenCLOn12 should not throw an exception is a particular device is discovered and not supported. I am attempting to get Geekbench source code access to confirm which OpenCLOn12 API gets called.

One interesting event that I see happening is that, after reproducing the above on this laptop, if two days pass by, and even if Windows Updates are paused, the issue is no longer reproduceable, presumably because OpenCLOn12 (location unknown) is somehow being updated to latest version where this issue appears fixed.

OpenCLOn12 team: Can you please help identify who/what component is responsible for updating OpenCLOn12 after new OS install or where does OpenCLOn12 resides in the OS. I am trying to understand how the issue appears to be fixing itself without manual intervention. Also, note that this will re-appear if/when a driver reports "GENERIC" instead of CORE.

jenatali commented 2 months ago

OpenCLOn12 is delivered via the Store: https://apps.microsoft.com/detail/9nqpsl29bfff?hl=en-US&gl=US. OEMs are able to include an inbox version of this package if they deem it appropriate. The intent has been for ARM devices where no native drivers are available, though we've learned that some x64 OEMs bundled it anyway. We've removed the x64 package going forward as we're not aware of any plans for x64 IHVs to provide devices without native drivers.

Going forward, we've also fixed the creation path to just not enumerate devices which don't support at least FEATURE_LEVEL_1_0_CORE. So new devices showing up with GENERIC will internally throw a caught exception and simply not be enumerated out of the CL API.

leonardo-intel commented 2 months ago

@jenatali The "OpenCL™, OpenGL®, and Vulkan® Compatibility Pack" that you linked is not currently installed on the machine, and it doesn't look like an OpenCLOn12.dll exists on the machine at all. Perhaps Geekbench compiles OpenCLOn12 along their source code? I see that the Geekbench binaries include a file called pl_opencl_x86_64.dll.

But the question remains: if OpenCLOn12 does not exist in a new OS install, and Geekbench is crashing while executing OpenCLOn12, how is this issue resolving itself after 2 days of zero activity? Where does OpenCLOn12 exist in the OS if an actual OpenCLOn12.dll does not exist? Any recommended next steps?

jenatali commented 2 months ago

It does not exist on the machine without that package being installed (unless you built it from source). If you're seeing that exception thrown in a debugger, use the debugger to see where the DLL is being loaded from. ~2 is around the right timeframe for the Store to auto-update, which is a different mechanism from Windows Update.