intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.1k stars 229 forks source link

clinfo on WSL2 using A770 causes BSOD #663

Closed maleadt closed 4 months ago

maleadt commented 11 months ago

I'm using the following set-up:

Doing clinfo in a WSL2 terminal starts printing some output, but quickly triggers a BSOD that mentions dxgmms2.sys and SYSTEM_THREAD_EXCEPTION_NOT_HANDLED. The generated dump file is corrupt, so I couldn't inspect it.

I'm also encountering this BSOD when loading oneAPI.jl, presumably when the first call to Level Zero happens (i.e. zeInit).

JablonskiMateusz commented 11 months ago

Hi @maleadt could you please ensure that if you remove intel.icd from /etc/OpenCL/vendors in the WSL and run clinfo, then there is no BSOD? I would like to confirm that the BSOD is related to our package

maleadt commented 11 months ago

Yes, when I hadn't installed an Intel ICD (i.e. before installing any of the compute-runtime packages) clinfo just returned 0 platforms.

JablonskiMateusz commented 11 months ago

Thanks for confirmation.

freshly set-up WSL2 Ubuntu

which Ubuntu version?

maleadt commented 11 months ago

which Ubuntu version?

The one Microsoft defaults to, which seems to be 22.04 (all packages fully updated).

eero-t commented 4 months ago

Do you mean that there's Windows kernel BSOD when trivial operations are done with Linux user-space compute stack under WSL? (Does not really sound like compute-runtime problem)

maleadt commented 4 months ago

Do you mean that there's Windows kernel BSOD when trivial operations are done with Linux user-space compute stack under WSL?

Correct.

eero-t commented 4 months ago

Unless you're using PCI passthrough for the device, and mention of "dgx" (directX) is just co-incidence, I do not see how this could be Linux side compute-runtime problem, rather than Windows driver issue. Have you reported the issue to Windows side?

Virtual machine using virtualized host drivers should not be able to BSOD the host...

maleadt commented 4 months ago

After recently updating both NEO/IGC, this issue doesn't occur anymore. Sadly, the native Intel GPU driver was updated too, so it's impossible to tell which updated fixed the issue... Anyway, I'm glad it got fixed, so this can be closed.

eero-t commented 4 months ago

Thanks for testing, reporting the results here!