Closed blackliner closed 8 months ago
I can also somehow reproduce it, it crashes about every 10 minutes when running the test suite
A significant change in WSL2 lately is that in insider builds WSLg is now enabled by default. You can try disabling it adding this to your .wslconfig
[wsl2]
guiapplications=false
and running wsl.exe --shutdown
for changes to take effect.
@blackliner could you also share dmesg
logs from during the tests? If possible open a new tab only for dmesg
so you can run dmesg -w
that this will continue logging kernel messages.
These are insteresting, but I'm a curious to know if something started to crash before it:
[ 1728.344430] dxgk:err: did not find packet to complete
[ 1728.349924] dxgk:err: did not find packet to complete
[ 1728.352292] dxgk:err: did not find packet to complete
of course its not happening anymore, but I keep my eyes open!
These last one appears to be all normal initializations, I have same behavior here:
....
[ 7.616341] dxgk:err: is_handle_valid Entry is freed 40000080 2
[ 7.616342] dxgk:err: hmgrtable_get_object_by_type invalid handle 40000080
[ 7.616343] dxgk:err: adapter_by_handle failed 40000080
[ 7.617547] dxgk:err: is_handle_valid Entry is freed 40000080 2
[ 7.617548] dxgk:err: hmgrtable_get_object_by_type invalid handle 40000080
[ 7.617548] dxgk:err: adapter_by_handle failed 40000080
[ 7.623782] dxgk:err: is_handle_valid Entry is freed 40000080 2
[ 7.623783] dxgk:err: hmgrtable_get_object_by_type invalid handle 40000080
[ 7.623784] dxgk:err: adapter_by_handle failed 40000080
...
No problem, if it happen to reproduce, let me know.
Maybe something interesting, it locked up again, but I was not dmesg
ing :-/
But this is the result of the dmesg after that, maybe relevant:
This issue has been automatically closed since it has not had any activity for the past year. If you're still experiencing this issue please re-file this as a new issue or feature request.
Thank you!
Windows Build Number
21390
WSL Version
Kernel Version
5.10.16.3-microsoft-standard-WSL2
Distro Version
Ubuntu 18.04
Other Software
Nvidia driver 470.76
Repro Steps
run unit tests that utilize CUDA hardware acceleration. Running
ctest -j 200
(yes, 200, to stress it a bit and since some tests do some sleep, it actually is faster ;-) ) on about 390 unit tests (proprietary code, cannot share), with some of them being CUDA based executables. CPU util (3900X) and GPU util (2080Ti) spiked to 100%, then it crashed.Just installed the new Nvidia driver and updated to newest insider, and never had that issue before. Could be that it is unrelated to CUDA, but I assume it due to the new driver + win build combo that seem to have fixed some past CUDA issues, but now this.
Expected Behavior
not to crash WSL2
Actual Behavior
getting a
[process exited with code 1]
while running unit tests in the windows terminal.Diagnostic Logs
From the event viewer:
Click to expand!
Virtual Machine has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x0, ErrorCode1: 0x0, ErrorCode2: 0x0, ErrorCode3: 0x0, ErrorCode4: 0x0. PreOSId: 0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID B8266772-F64F-44A8-B874-03468D05FC40) Guest message: ``` [ 1728.340387] ---[ end trace a3125c1dcde97ed4 ]--- [ 1728.344430] dxgk:err: did not find packet to complete [ 1728.349924] dxgk:err: did not find packet to complete [ 1728.352292] dxgk:err: did not find packet to complete [ 1728.363664] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] SMP NOPTI [ 1728.363676] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.16.3-microsoft-standard-WSL2 #1 [ 1728.363683] RIP: 0010:process_completion_packet+0x5a/0x110 [ 1728.363686] Code: 14 34 00 48 8b 5d 20 48 8d 7d 20 49 89 c7 48 39 fb 0f 84 ae 00 00 00 49 8b 44 24 08 eb 0c 48 8b 1b 48 39 fb 0f 84 9b 00 00 00 <48> 3b 43 10 75 ee 48 89 df e8 f8 4c d7 ff 84 c0 74 0e 48 8b 13 48 [ 1728.363690] RSP: 0018:ffffc90000003ef0 EFLAGS: 00010083 [ 1728.363693] RAX: 0000000000013bc7 RBX: dead000000000100 RCX: ffffc90000c1e000 [ 1728.363696] RDX: 0000000000000001 RSI: ffffc90000c5c488 RDI: ffffc90000105020 [ 1728.363698] RBP: ffffc90000105000 R08: 0000000000000040 R09: 0000000000000000 [ 1728.363700] R10: 000000000000001f R11: 0000000000000000 R12: ffffc90000c5c488 [ 1728.363702] R13: ffffc90000105018 R14: 0000000000000002 R15: 0000000000000286 [ 1728.363707] FS: 0000000000000000(0000) GS:ffff8891d7800000(0000) knlGS:0000000000000000 [ 1728.363711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1728.363713] CR2: 00005612dc0137c8 CR3: 000000012cda0000 CR4: 0000000000350eb0 [ 1728.363715] Call Trace: [ 1728.363719]