Open Bulmer opened 7 months ago
Same here:
bastian@ryzen:~$ sudo dmesg -c
[ 420.599084] w[3941]: segfault at 4c ip 0000562d27e7409e sp 00007ffd7dccfd80 error 4 in w[562d27e73000+3000] likely on CPU 3 (core 3, socket 0)
[ 420.599106] Code: 0f be c3 48 8b 12 f6 44 42 01 40 74 7f 80 fb 20 74 7a 0f be fb 48 83 c5 01 e8 6e f1 ff ff 49 39 ee 0f 84 7d 00 00 00 45 89 e7 <0f> b6 5d 00 84 db 75 ba 45 85 ff 75 65 48 8b 35 76 3f 00 00 bf 2d
[ 420.623234] w[3946]: segfault at 4c ip 0000557627ab209e sp 00007ffd420bf770 error 4 in w[557627ab1000+3000] likely on CPU 6 (core 2, socket 0)
[ 420.623255] Code: 0f be c3 48 8b 12 f6 44 42 01 40 74 7f 80 fb 20 74 7a 0f be fb 48 83 c5 01 e8 6e f1 ff ff 49 39 ee 0f 84 7d 00 00 00 45 89 e7 <0f> b6 5d 00 84 db 75 ba 45 85 ff 75 65 48 8b 35 76 3f 00 00 bf 2d
[ 420.662887] w[3952]: segfault at 4c ip 000056436296509e sp 00007ffc5e4b9880 error 4 in w[564362964000+3000] likely on CPU 3 (core 3, socket 0)
[ 420.662897] Code: 0f be c3 48 8b 12 f6 44 42 01 40 74 7f 80 fb 20 74 7a 0f be fb 48 83 c5 01 e8 6e f1 ff ff 49 39 ee 0f 84 7d 00 00 00 45 89 e7 <0f> b6 5d 00 84 db 75 ba 45 85 ff 75 65 48 8b 35 76 3f 00 00 bf 2d
bastian@ryzen:~$ sudo circadian -t
Reading config from file: /etc/circadian.conf
Circadian launching.
CircadianConfig { verbosity: 0, idle_time: 600, auto_wake: None, on_idle: Some("systemctl hybrid-sleep"), on_wake: Some("date >>wake_up_from_circadian_sleep.log.txt"), tty_input: true, x11_input: true, ssh_block: false, smb_block: false, nfs_block: false, audio_block: true, max_cpu_load: Some(0.9), process_block: ["^dd$", "^rsync$", "^apt-get$", "^dpkg$", "^cp$", "^mv$", "^restic$", "^mpv$"] }
Got --test: running idle test and exiting.
Idle Detection Summary:
w* : cannot parse integer from empty string
xssstate* : 0
xprintidle* : 0
Wake block : 600
TTY (combined)* : 4294967295
X11 (combined)* : 0
Idle (min) : 0
Idle target : 600
Until idle : 600
IDLE? : false
CPU load* : false
SSH : false
SMB : false
NFS : false
Audio* : true
Processes* : false
BLOCKED? : true
bastian@ryzen:~$ circadian --version
circadian 0.8.4
bastian@ryzen:~$ uname -a
Linux ryzen 6.9.10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.9.10-1 (2024-07-19) x86_64 GNU/Linux
bastian@ryzen:~$ cat /etc/*release*
PRETTY_NAME="Debian GNU/Linux trixie/sid"
NAME="Debian GNU/Linux"
VERSION_CODENAME=trixie
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
So both of you are getting segfaults in the w
application that circadian calls. You can set tty_input = no
in the circadian config to tell it not to use w
, but that's just a workaround for the system being broken... is Debian really shipping a broken w
? What happens if you just run w
in a terminal as root?
w
on Ubuntu 24.04 returns info just fine. Something else may be failing in the parsing. My outputs
Reading config from file: /etc/circadian.conf
Circadian launching.
CircadianConfig { verbosity: 0, idle_time: 900, auto_wake: None, on_idle: Some("systemctl suspend"), on_wake: Some("/home/sss/.scripts/cuda_reset.bash"), tty_input: true, x11_input: true, ssh_block: false, smb_block: false, nfs_block: false, audio_block: true, max_cpu_load: Some(0.5), process_block: ["^dd$", "^rsync$", "^apt-get$", "^dpkg$", "^cp$", "^mv$", "^poddl$", "^borg$"] }
Got --test: running idle test and exiting.
WARNING: xssstate failed for socket /tmp/.X11-unix/X0 with error: No such file or directory (os error 2)
WARNING: xssstate failed for socket /tmp/.X11-unix/X0 with error: No such file or directory (os error 2)
WARNING: xprintidle failed for socket /tmp/.X11-unix/X0 with error: No such file or directory (os error 2)
WARNING: xprintidle failed for socket /tmp/.X11-unix/X0 with error: No such file or directory (os error 2)
Idle Detection Summary:
w* : cannot parse integer from empty string
xssstate* : No displays found.
xprintidle* : No displays found.
Wake block : 900
TTY (combined)* : 4294967295
X11 (combined)* : 4294967295
Idle (min) : 4294967295
Idle target : 900
Until idle : 0
IDLE? : false
CPU load* : false
SSH : No such file or directory (os error 2)
SMB : No such file or directory (os error 2)
NFS : No such file or directory (os error 2)
Audio* : false
Processes* : false
BLOCKED? : false
❯ sudo w
16:49:48 up 1 day, 5:39, 3 users, load average: 0.00, 0.02, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
gdm tty1 - Wed11 24:43m 44.19s 0.02s /usr/libexec/gsd-printer
sss 10.10.10.80 16:39 29:38m 0.00s 0.01s sshd: sss [priv]
sss 10.10.10.80 16:48 29:38m 0.00s 0.01s sshd: sss [priv]````
@sghael Are you also getting segfaults in your dmesg log? Does coredumpctl list
show w
for any of the entries?
In the two dmesg logs above, it clearly shows that w
is crashing ("w[55548]: segfault..."), which is what leads Circadian to fail to parse the output... there is no output.
If w
is crashing when Circadian launches it and not when launched from the shell, it could be because of a difference in arguments or a difference in execution environment. Circadian launches it with a variety of different arguments: w -us
, w -hus
, w -us CIRCADIAN_FAKEUSER
, w -usf CIRCADIAN_FAKEUSER
, w -hus
, or w -husf
. All of these should run without crashing.
Can everyone who experiences this please post your OS version (cat /etc/issue
), w
version (w --version
), and ideally dmesg log? If w
is listed in your coredumpctl entries, it could also be helpful to post the result of coredumpctl info w
.
───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ File: /etc/issue
───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Ubuntu 24.04.1 LTS \n \l
2 │
───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯ w --version
w from procps-ng 4.0.4
❯ coredumpctl info w
PID: 8362 (w)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Fri 2024-08-30 11:22:21 EDT (54s ago)
Command Line: w -us
Executable: /usr/bin/w
Control Group: /user.slice/user-1000.slice/session-18.scope
Unit: session-18.scope
Slice: user-1000.slice
Session: 18
Owner UID: 1000 (sghael)
Boot ID: b7fa120af98d47ffbe2d4cc-----
Machine ID: 6b40ab502b904fe2aa17aa-----
Hostname: neural
Storage: /var/lib/systemd/coredump/core.w.0.b7fa120af98d47ffbe2d4cc26055ce5f.8362.1725031341000000.zst (inacc>
Message: Process 8362 (w) of user 0 dumped core.
Module libzstd.so.1 from deb libzstd-1.5.5+dfsg2-2build1.1.amd64
Module libsystemd.so.0 from deb systemd-255.4-1ubuntu8.4.amd64
Stack trace of thread 8362:
#0 0x000062c249d9ee73 n/a (w + 0x2e73)
#1 0x000062c249d9f161 n/a (w + 0x3161)
#2 0x000062c249da00e0 n/a (w + 0x40e0)
#3 0x000062c249d9ea8f n/a (w + 0x2a8f)
#4 0x000073e2ea22a1ca __libc_start_call_main (libc.so.6 + 0x2a1ca)
#5 0x000073e2ea22a28b __libc_start_main_impl (libc.so.6 + 0x2a28b)
#6 0x000062c249d9ed25 n/a (w + 0x2d25)
ELF object binary architecture: AMD x86-64
PID: 8380 (w)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Fri 2024-08-30 11:22:21 EDT (54s ago)
Command Line: w -hus
Executable: /usr/bin/w
Control Group: /user.slice/user-1000.slice/session-18.scope
Unit: session-18.scope
Slice: user-1000.slice
Session: 18
Owner UID: 1000 (sghael)
Boot ID: b7fa120af98d47ffbe2d4cc260-----
Machine ID: 6b40ab502b904fe2aa17aa5f1ac-----
Hostname: neural
Storage: /var/lib/systemd/coredump/core.w.0.b7fa120af98d47ffbe2d4cc26055ce5f.8380.1725031341000000.zst (inacc>
Message: Process 8380 (w) of user 0 dumped core.
Module libzstd.so.1 from deb libzstd-1.5.5+dfsg2-2build1.1.amd64
Module libsystemd.so.0 from deb systemd-255.4-1ubuntu8.4.amd64
Stack trace of thread 8380:
#0 0x00005deb31925e73 n/a (w + 0x2e73)
#1 0x00005deb31926161 n/a (w + 0x3161)
#2 0x00005deb319270e0 n/a (w + 0x40e0)
#3 0x00005deb31925a8f n/a (w + 0x2a8f)
#4 0x000077132962a1ca __libc_start_call_main (libc.so.6 + 0x2a1ca)
#5 0x000077132962a28b __libc_start_main_impl (libc.so.6 + 0x2a28b)
#6 0x00005deb31925d25 n/a (w + 0x2d25)
ELF object binary architecture: AMD x86-64
PID: 8403 (w)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Fri 2024-08-30 11:22:21 EDT (54s ago)
Command Line: w -hus
Executable: /usr/bin/w
Control Group: /user.slice/user-1000.slice/session-18.scope
Unit: session-18.scope
Slice: user-1000.slice
Session: 18
Owner UID: 1000 (sghael)
Boot ID: b7fa120af98d47ffbe2d4cc260-----
Machine ID: 6b40ab502b904fe2aa17aa5f1ac-----
Hostname: neural
Storage: /var/lib/systemd/coredump/core.w.0.b7fa120af98d47ffbe2d4cc26055ce5f.8403.1725031341000000.zst (inacc>
Message: Process 8403 (w) of user 0 dumped core.
Module libzstd.so.1 from deb libzstd-1.5.5+dfsg2-2build1.1.amd64
Module libsystemd.so.0 from deb systemd-255.4-1ubuntu8.4.amd64
Stack trace of thread 8403:
#0 0x0000598636812e73 n/a (w + 0x2e73)
#1 0x0000598636813161 n/a (w + 0x3161)
#2 0x00005986368140e0 n/a (w + 0x40e0)
#3 0x0000598636812a8f n/a (w + 0x2a8f)
#4 0x00007d2cda82a1ca __libc_start_call_main (libc.so.6 + 0x2a1ca)
#5 0x00007d2cda82a28b __libc_start_main_impl (libc.so.6 + 0x2a28b)
#6 0x0000598636812d25 n/a (w + 0x2d25)
ELF object binary architecture: AMD x86-64```
this seems like a potential problem https://github.com/mrmekon/circadian/blob/5dd06665d38c70f3f100d253e98be3c82e453efc/src/main.rs#L518
on my Ubuntu 24.04:
───────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ File: /etc/issue
───────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Ubuntu 24.04.1 LTS \n \l
2 │
───────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯ w --version
w from procps-ng 4.0.4
❯ w -hus
zsh: segmentation fault (core dumped) w -hus
interesting:
❯ w -h
sghael 10.10.10.199 09:49 1:37m 0.00s 0.02s sshd: sghael [priv]
gdm tty1 - 10:01 7:23m 4.93s 0.01s /usr/libexec/gsd-printer
❯ w -u
11:38:44 up 3:23, 2 users, load average: 0.05, 0.05, 0.01
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
sghael 10.10.10.199 09:49 1:37m 0.00s 0.02s sshd: sghael [priv]
gdm tty1 - 10:01 7:23m 4.93s 0.01s /usr/libexec/gsd-printer
❯ w -s
11:38:47 up 3:23, 2 users, load average: 0.04, 0.05, 0.01
USER TTY FROM IDLE WHAT
zsh: segmentation fault (core dumped) w -s
Oh wow, that's very helpful, thanks!
It looks like Ubuntu and Debian are shipping with a w
that segfaults on the --short
option?? That's extremely unexpected... that should just be hiding fields.
Also unusual because mine (Arch) reports the same version, but shows different default fields and doesn't segfault:
$ w --version
w from procps-ng 4.0.4
$ w -u
17:44:25 up 68 days, 2:51, 2 users, load average: 1.02, 1.07, 0.92
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
trevor tty1 23Jun24 4days 16:53m ? xinit /home/trevor/.xinitrc -- /etc/X11/xinit/xserverrc :0 vt1 -keeptty -auth /tmp/serverauth.Bpopi5xqRs
trevor 23Jun24 24:52m 0.00s 1:06 /usr/lib/systemd/systemd --user --deserialize=9
$ w -s
17:43:15 up 68 days, 2:49, 2 users, load average: 1.64, 1.17, 0.94
USER TTY IDLE WHAT
trevor tty1 4days xinit /home/trevor/.xinitrc -- /etc/X11/xinit/xserverrc :0 vt1 -keeptty -auth /tmp/serverauth.Bpopi5xqRs
trevor 24:51m /usr/lib/systemd/systemd --user --deserialize=9
The root problem is either with procps-ng or Debian or Ubuntu, depending on whether it is patched by the distros or not, but it would be nice to find a workaround on the Circadian side in the meantime.
If I get time (big if), I'll do some testing in an Ubuntu VM and see if I can reproduce and work around it.
Ok, it's upstream and it was already patched 11 months ago, but not released yet. It has something to do with the systemd environment, which is probably why only some of us hit it.
Patch in procps-ng: https://gitlab.com/procps-ng/procps/-/commit/79042e07fab9956135a21b1df7a69d1fbde7ef79 Issue where it was reported: https://gitlab.com/procps-ng/procps/-/issues/301
yep, that's it. Thanks for finding the upstream bug.
I'm looking at the circadian code and wonder if it solves the problem if
1) w_from_args
removed all references to the -s
flag
2) idle_w
also didn't use the -s
flag and then explicitly calculates the idle field offset, instead of assuming it is always the second to last.
Yes, dropping -s
and counting fields is an option.
But now that I see what it's doing, I'm going to look into bypassing w
entirely. It's a lot of sketchy string parsing for what appears to be just checking the atime of /dev/tty*, which can just be done directly. Most of its logic is about choosing which tty's to check... I wonder if there's anything wrong with just checking all of them.
@mrmekon I took a crack at removing w. I tested a bit on my Ubuntu box and the tty settings work now. I am less sure if the X11 checks are still working properly. It would be great to add some unit test here based on expected signals, but I've left that for another day.
I am running Debian sid (Linux 6.7.9-amd64) and getting these errors every 5 seconds.