Open paul-haskell opened 2 weeks ago
If this a feature request, please reply with '/feature'. If this is a question, reply with '/question'. Otherwise please attach logs by following the instructions below, your issue will not be reviewed unless they are added. These logs will help us understand what is going on in your machine.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
I already tried the fixes in #1754.
@paul-haskell is systemd-coredump
installed? Your coredumps might be in the journal
. Is there any output when you run coredumpctl list
?
Also, if gdb
is attached to a process, running generate-core-file
does create a core dump, i.e. process 316 in this case:
zcobol@toto:~$ file core.316
core.316: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '-bash', real uid: 1002, effective uid: 1002, real gid: 1002, effective gid: 1002, execfn: '/bin/bash', platform: 'x86_64'
The kernel.core_pattern
was not modified. This is the default:
zcobol@toto:~$ sysctl kernel.core_pattern
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h
Hi there,
systemd-coredump is not installed. When I try to run 'coredumpctl' I get the message:
Command 'coredumpctl' not found, but can be installed with:
sudo apt install systemd-coredump
When I run "sysctl -a | grep core_pattern" on my WSL instance, I get: /mnt/wslg/dumps/core.%e
/mnt/wslg/dumps is empty, even after I run my core-making program. The directory's file permissions are drwxrwxrwx
In wsl-2.3.17
the value of core_pattern
is different:
elsaco@eleven:~/test$ sysctl kernel.core_pattern
kernel.core_pattern = |/wsl-capture-crash %t %E %p %s
Using strings
on /init
it shows:
elsaco@eleven:~/test$ strings /init | grep crash
<3>WSL (%d) ERROR: %s:%u: Received error while trying to capture crash dump: %u
<6>WSL (%d): Capturing crash for pid: %s, executable: %s, signal: %s, port: %u
<3>WSL (%d) ERROR: %s:%u: Error while trying read crash dump from stdin, %u
/wsl-capture-crash
wsl-capture-crash
|/wsl-capture-crash %t %E %p %s
crash-dump
so it looks hardcoded into the WSL's own init
Using a simple divide-by-zero test it does crash dumps:
elsaco@eleven:~/test$ ./zero
Floating point exception (core dumped)
and the trace shows in the dmesg
output:
[20108.624055] traps: zero[12868] trap divide error ip:563cda3a4184 sp:7ffe113bb0a0 error:0 in zero[563cda3a4000+1000]
[20108.624065] potentially unexpected fatal signal 8.
[20108.624066] CPU: 0 PID: 12868 Comm: zero Not tainted 5.15.153.1-microsoft-standard-WSL2 #1
[20108.624068] RIP: 0033:0x563cda3a4184
[20108.624071] Code: 00 75 07 b8 ff ff ff ff eb 07 8b 45 fc 99 f7 7d f8 5d c3 f3 0f 1e fa 55 48 89 e5 48 83 ec 10 b8 0a 00 00 00 b9 00 00 00 00 99 <f7> f9 89 45 f4 b8 00 00 00 00 b9 00 00 00 00 99 f7 f9 89 45 f8 8b
[20108.624072] RSP: 002b:00007ffe113bb0a0 EFLAGS: 00010206
[20108.624073] RAX: 000000000000000a RBX: 00007ffe113bb1d8 RCX: 0000000000000000
[20108.624074] RDX: 0000000000000000 RSI: 00007ffe113bb1d8 RDI: 0000000000000001
[20108.624075] RBP: 00007ffe113bb0b0 R08: 0000000000000000 R09: 00007fa6ffd20380
[20108.624076] R10: 00007ffe113badd0 R11: 0000000000000203 R12: 0000000000000001
[20108.624076] R13: 0000000000000000 R14: 0000563cda3a6dc0 R15: 00007fa6ffd53000
[20108.624077] FS: 00007fa6ffafe740 GS: 0000000000000000
[20108.624514] WSL (12869): Capturing crash for pid: 10759, executable: !home!elsaco!test!zero
[20108.624516] , signal: 8, port: 50005
and journalctl
Sep 08 21:00:46 eleven kernel: traps: zero[12868] trap divide error ip:563cda3a4184 sp:7ffe113bb0a0 error:0 in zero[563>
Sep 08 21:00:46 eleven kernel: potentially unexpected fatal signal 8.
Sep 08 21:00:46 eleven kernel: CPU: 0 PID: 12868 Comm: zero Not tainted 5.15.153.1-microsoft-standard-WSL2 #1
Sep 08 21:00:46 eleven kernel: RIP: 0033:0x563cda3a4184
Sep 08 21:00:46 eleven kernel: Code: 00 75 07 b8 ff ff ff ff eb 07 8b 45 fc 99 f7 7d f8 5d c3 f3 0f 1e fa 55 48 89 e5 4>
Sep 08 21:00:46 eleven kernel: RSP: 002b:00007ffe113bb0a0 EFLAGS: 00010206
Sep 08 21:00:46 eleven kernel: RAX: 000000000000000a RBX: 00007ffe113bb1d8 RCX: 0000000000000000
Sep 08 21:00:46 eleven kernel: RDX: 0000000000000000 RSI: 00007ffe113bb1d8 RDI: 0000000000000001
Sep 08 21:00:46 eleven kernel: RBP: 00007ffe113bb0b0 R08: 0000000000000000 R09: 00007fa6ffd20380
Sep 08 21:00:46 eleven kernel: R10: 00007ffe113badd0 R11: 0000000000000203 R12: 0000000000000001
Sep 08 21:00:46 eleven kernel: R13: 0000000000000000 R14: 0000563cda3a6dc0 R15: 00007fa6ffd53000
Sep 08 21:00:46 eleven kernel: FS: 00007fa6ffafe740 GS: 0000000000000000
Sep 08 21:00:46 eleven unknown: WSL (12869): Capturing crash for pid: 10759, executable: !home!elsaco!test!zero
Sep 08 21:00:46 eleven unknown: , signal: 8, port: 50005
However, I can't figure out this entry: WSL: Capturing crash for pid:
. Where does wsl-capture-crash
stores the actual core file!?
In wsl-2.3.17
core dumps are stored in \AppData\Local\Temp\wsl-crashes
folder under your Windows home directory. You'll notice this kind of entries when running dmesg
:
WSL (573): Capturing crash for pid: 366, executable: !home!zcobol!test!zero, signal:8, port: 50005
WSL is capturing the crash and dumps in the wsl-crashes
folder.
Sample file:
PS C:\Users\valli>\AppData\Local\Temp\wsl-crashes\wsl-crash-1726372480-366-_home_zcobol_test_zero-8.dmp
Run sysctl kernel.core_pattern
and if you didn't mess with the settings it should be like:
zcobol@texas:~$ sysctl kernel.core_pattern
kernel.core_pattern = |/wsl-capture-crash %t %E %p %s
Using systemd-coredump
didn't work because it would kill init
:
systemd-coredump[544]: Due to PID 1 having crashed coredump collection will now be turned off
I checked my system: I do not have a \AppData\Local\Temp\wsl-crashes directory. (I do have \AppData\Local\Temp) My dmesg output does not show any "Capturing crash" messages. My "sysctl kernel.core_pattern" shows "/mnt/wslg/dumps/core.%e". And I do not have any files in /mnt/wslg/dumps, though I do have that directory.
What @zcobol and and @elsaco said is right. We indeed added logic to capture coredumps in 2.3.17. The default path is %tmp%\wsl-crashes
.
You can override the crash dump folder via:
[wsl2]
crashDumpFolder=C:\\path\\to\\folder
And you can completely disable this behavior via:
[wsl2]
maxCrashDumpCount=1
This will completely prevent WSL from touching core_pattern
, which should allow to set your own custom path.
Let me know if this helps collecting coredumps for you !
@paul-haskell: You most likely have an older build installed. Try running: wsl --update --pre-release to get the latest.
@OneBlue, thanks for your message -- I am a lot closer after upgrading to WSL 2.3.17.
First, I ran with the default kernel.core_pattern of "|/wsl-capture-crash %t %E %p %s". When I ran my program that calls abort(), I did not have a .../AppData/Local/Temp/wsl-crashes directory created.
Next, I tried:
Any ideas why my corefiles are empty?
@paul-haskell: Can you collect /logs of this happening (for both scenarios) ?
Here are the requested log for the second scenario i.e. set kernel.core_pattern=core.%e . Thanks for looking. (I will upload the other logs shortly.) WslLogs-2024-09-20_14-33-52.zip
Here are the logs for the first scenario (kernel.core_pattern=|/wsl-capture-crash %t %E %p %s ) WslLogs-2024-09-20_14-40-21.zip
Thank you @paul-haskell. Looking at the logs, I see that a crash dump is generated:
Microsoft.Windows.Lxss.Manager LinuxCrash 09-20-2024 14:40:57.101 " " "FullPath: C:\Users\phaskell\AppData\Local\temp\wsl-crashes\wsl-crash-1726868457-485-_mnt_c_phaskell_CS221_Private_ClassDays_Day17_makeCore-6.dmp
Pid: 485
Signal: 6
process: !mnt!c!phaskell!CS221!Private!ClassDays!Day17!makeCore
wslVersion: 2.3.17.0" 4996 14140 5 00000000-0000-0000-0000-000000000000
Can you check the contents of C:\Users\phaskell\AppData\Local\temp\wsl-crashes\
?
I do see a core in .../Local/Temp/wsl-crashes and it is nonempty. So "case 1" works! Thank you. Any idea why "case 2" i.e. overridden kernel.core_pattern only creates empty corefiles? (The reason I care is because I am teaching a class on system programming, and I want to make it easy for students on Windows and Mac platforms to be able to debug with corefiles. If I can get the corefiles in the current directory via some configuration script, it will make the students' lives easy.)
@paul-haskell: Does disabling systemd and restarting the distro help with case 2?
I did a quick check, and I have 159 services managed by systemd. systemd manages all the startup services with Ubuntu, right? Can I really stop all of them?
(I tried stopping apport.service and setting kernel.core_pattern=core.%e but I still get empty corefiles.)
Can you by setting
[boot]
systemd=false
in /etc/wsl.conf
Ok, I did that test: In /etc/wsl.conf I set systemd=false, and I restarted my Ubuntu.
The system boots really quickly now. Unfortunately my corefiles are still empty. I'll attach another log to the case.
WslLogs-2024-09-20_16-56-24.zip
Here are the logs with systemd=false in wsl.conf and with kernel.core_pattern=core.%e (and with empty corefiles)
Discussed in https://github.com/microsoft/WSL/discussions/11992