Closed badbat75 closed 1 year ago
communication with dbus seems to time-out. Please contact your distro for help, such integration issues must be handled by your distro.
I am trying to automate a Linux from Scratch build for RaspberryPi, I successfully made it using systemd-251 and D-Bus-1.15.4 (even this is a development version) but works flawlessly with D-Bus-1.14.4. Now I recompiled all the packages but using systemd-252 (either with D-Bus-1.14.4 and 1.15.4) and here I got the problems:
If you think this is a regression in systemd, you might try to run git bisect
to find the faulty commit. Given that you are running LFS, you should be familiar with compiling systemd.
I am trying to automate a Linux from Scratch build for RaspberryPi, I successfully made it using systemd-251 and D-Bus-1.15.4 (even this is a development version) but works flawlessly with D-Bus-1.14.4. Now I recompiled all the packages but using systemd-252 (either with D-Bus-1.14.4 and 1.15.4) and here I got the problems:
If you think this is a regression in systemd, you might try to run
git bisect
to find the faulty commit. Given that you are running LFS, you should be familiar with compiling systemd.
Wow, I was not aware of this git function... awesome!!! Thank you. I hope to not go in this direction, there are more than 2500 commits between 251 and 252 even with bisect will be a long journey, my build process is semi-automated, I need to rebuild the O.S. image and flash the sdcard and test on the RPi for each test.
Ok, I tried with hostnamectl that is failing as well. So I traced hostnamectl, dbus-daemon and systemd with strace I attach the final messages before the timeout:
DBus received message from hostnamectl:
17:28:07.658728 epoll_wait(4, [{events=EPOLLIN, data={u32=3, u64=3}}], 64, -1) = 1
17:28:08.748109 accept4(3, {sa_family=AF_UNIX}, [16 => 2], SOCK_CLOEXEC) = 14
17:28:08.748689 fcntl64(14, F_GETFL) = 0x2 (flags O_RDWR)
17:28:08.748981 fcntl64(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0
17:28:08.749281 getsockname(14, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, [128 => 30]) = 0
17:28:08.749792 epoll_ctl(4, EPOLL_CTL_ADD, 14, {events=EPOLLET, data={u32=14, u64=14}}) = 0
17:28:08.750107 epoll_ctl(4, EPOLL_CTL_MOD, 14, {events=EPOLLIN, data={u32=14, u64=14}}) = 0
17:28:08.750436 epoll_wait(4, [{events=EPOLLIN, data={u32=14, u64=14}}], 64, 29999) = 1
17:28:08.751879 recvmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1
17:28:08.752276 getsockopt(14, SOL_SOCKET, SO_PEERCRED, {pid=990, uid=0, gid=0}, [12]) = 0
17:28:08.752704 getsockopt(14, SOL_SOCKET, SO_PEERSEC, 0x2014050, [1024]) = -1 ENOPROTOOPT (Protocol not available)
17:28:08.753013 getsockopt(14, SOL_SOCKET, SO_PEERGROUPS, [0], [1024 => 4]) = 0
17:28:08.753377 read(14, "AUTH EXTERNAL\r\nDATA\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 2048) = 47
17:28:08.753834 epoll_ctl(4, EPOLL_CTL_MOD, 14, {events=EPOLLET, data={u32=14, u64=14}}) = 0
17:28:08.754215 epoll_ctl(4, EPOLL_CTL_MOD, 14, {events=EPOLLOUT, data={u32=14, u64=14}}) = 0
17:28:08.754552 epoll_wait(4, [{events=EPOLLOUT, data={u32=14, u64=14}}], 64, 29995) = 1
17:28:08.754866 send(14, "DATA\r\nOK f880d267098091b49af1ad2e63ef9e0b\r\nAGREE_UNIX_FD\r\n", 58, MSG_NOSIGNAL) = 58
17:28:08.755312 geteuid32() = 18
17:28:08.755640 epoll_ctl(4, EPOLL_CTL_MOD, 14, {events=EPOLLIN|EPOLLOUT, data={u32=14, u64=14}}) = 0
17:28:08.755911 epoll_ctl(4, EPOLL_CTL_MOD, 14, {events=EPOLLIN, data={u32=14, u64=14}}) = 0
17:28:08.756184 epoll_wait(4, [{events=EPOLLIN, data={u32=14, u64=14}}], 64, 29994) = 1
17:28:08.757195 recvmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\0\0\0\0\1\0\0\0m\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\3\1s\0\5\0\0\0Hello\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0", iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 128
17:28:08.757650 recvmsg(14, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
17:28:08.758341 access("/run/systemd/seats/", F_OK) = 0
17:28:08.758704 openat(AT_FDCWD, "/run/systemd/users/0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
17:28:08.759282 openat(AT_FDCWD, "/proc/990/cmdline", O_RDONLY|O_LARGEFILE) = 15
17:28:08.759617 read(15, "hostnamectl\0", 50) = 12
17:28:08.759903 close(15) = 0
17:28:08.760838 sendmsg(11, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\36\0\0\0X\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\20\0\0\0NameOwnerChanged\0\0\0\0\0\0\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\10\1g\0\3sss\0\0\0\0\0\0\0\0", iov_len=160}, {iov_base="\5\0\0\0:1.46\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0:1.46\0", iov_len=30}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 190
17:28:08.761324 sendmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\2\1\1\n\0\0\0\1\0\0\0=\0\0\0\6\1s\0\5\0\0\0:1.46\0\0\0\5\1u\0\1\0\0\0\10\1g\0\1s\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0", iov_len=80}, {iov_base="\5\0\0\0:1.46\0", iov_len=10}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 90
17:28:08.761741 sendmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\n\0\0\0\2\0\0\0\215\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\f\0\0\0NameAcquired\0\0\0\0\6\1s\0\5\0\0\0:1.46\0\0\0\10\1g\0\1s\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0", iov_len=160}, {iov_base="\5\0\0\0:1.46\0", iov_len=10}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 170
17:28:08.762152 epoll_wait(4, [{events=EPOLLIN, data={u32=14, u64=14}}], 64, -1) = 1
17:28:08.763568 recvmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\1\0\1\5\0\0\0\2\0\0\0\217\0\0\0\1\1o\0\32\0\0\0/org/freedesktop/hostname1\0\0\0\0\0\0\3\1s\0\6\0\0\0GetAll\0\0\2\1s\0\37\0\0\0org.freedesktop.DBus.Properties\0\6\1s\0\31\0\0\0org.freedesktop.hostname1\0\0\0\0\0\0\0\10\1g\0\1s\0\0\0\0\0\0\0", iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 165
17:28:08.764118 recvmsg(14, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
Dbus again check if the service is available on the filesystem the send the start message to systemd
17:28:08.764596 statx(AT_FDCWD, "/usr/share/dbus-1/system-services/org.freedesktop.hostname1.service", AT_STATX_SYNC_AS_STAT|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=432, ...}) = 0
17:28:08.765224 getpid() = 283
17:28:08.765510 send(16, "<30>Feb 17 17:28:08 dbus-daemon[283]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.46' (uid=0 pid=990 comm=\"hostnamectl\")", 213, MSG_NOSIGNAL) = 213
17:28:08.765950 sendmsg(12, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1+\0\0\0L\0\0\0\277\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\2\1s\0\"\0\0\0org.freedesktop.systemd1.Activator\0\0\0\0\0\0\3\1s\0\21\0\0\0ActivationRequest\0\0\0\0\0\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\30\0\0\0org.freedesktop.systemd1\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", iov_len=208}, {iov_base="&\0\0\0dbus-org.freedesktop.hostname1.service\0", iov_len=43}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 251
17:28:08.766469 epoll_wait(4, [], 64, 24999) = 0
systemd receive the message but do nothing:
17:28:08.767184 recvmsg(75, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1+\0\0\0L\0\0\0\277\0\0\0\1\1o\0\25\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 24
17:28:08.767800 recvmsg(75, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="/org/freedesktop/DBus\0\0\0\2\1s\0\"\0\0\0org.freedesktop.systemd1.Activator\0\0\0\0\0\0\3\1s\0\21\0\0\0ActivationRequest\0\0\0\0\0\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\6\1s\0\30\0\0\0org.freedesktop.systemd1\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0&\0\0\0dbus-org.freedesktop.hostname1.service\0", iov_len=227}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 227
17:28:08.768309 gettid() = 1
17:28:08.768590 epoll_wait(4, <detached ...>
so dbus send timeout messages after 25 secs:
17:28:33.790997 getpid() = 283
17:28:33.792065 send(16, "<37>Feb 17 17:28:33 dbus-daemon[283]: [system] Failed to activate service 'org.freedesktop.hostname1': timed out (service_start_timeout=25000ms)", 144, MSG_NOSIGNAL) = 144
17:28:33.794293 sendmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1f\0\0\0\3\0\0\0m\0\0\0\6\1s\0\5\0\0\0:1.46\0\0\0\4\1s\0#\0\0\0org.freedesktop.DBus.Error.TimedOut\0\0\0\0\0\5\1u\0\2\0\0\0\10\1g\0\1s\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0", iov_len=128}, {iov_base="a\0\0\0Failed to activate service 'org.freedesktop.hostname1': timed out (service_start_timeout=25000ms)\0", iov_len=102}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 EPIPE (Broken pipe)
17:28:33.794813 poll([{fd=14, events=POLLOUT}], 1, 0) = 1 ([{fd=14, revents=POLLOUT|POLLHUP}])
17:28:33.795224 sendmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1f\0\0\0\3\0\0\0m\0\0\0\6\1s\0\5\0\0\0:1.46\0\0\0\4\1s\0#\0\0\0org.freedesktop.DBus.Error.TimedOut\0\0\0\0\0\5\1u\0\2\0\0\0\10\1g\0\1s\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0", iov_len=128}, {iov_base="a\0\0\0Failed to activate service 'org.freedesktop.hostname1': timed out (service_start_timeout=25000ms)\0", iov_len=102}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 EPIPE (Broken pipe)
17:28:33.795615 epoll_ctl(4, EPOLL_CTL_MOD, 14, {events=EPOLLIN|EPOLLOUT, data={u32=14, u64=14}}) = 0
17:28:33.795971 epoll_wait(4, [{events=EPOLLIN|EPOLLOUT|EPOLLHUP, data={u32=14, u64=14}}], 64, -1) = 1
17:28:33.796280 sendmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\3\1\1f\0\0\0\3\0\0\0m\0\0\0\6\1s\0\5\0\0\0:1.46\0\0\0\4\1s\0#\0\0\0org.freedesktop.DBus.Error.TimedOut\0\0\0\0\0\5\1u\0\2\0\0\0\10\1g\0\1s\0\0\7\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0", iov_len=128}, {iov_base="a\0\0\0Failed to activate service 'org.freedesktop.hostname1': timed out (service_start_timeout=25000ms)\0", iov_len=102}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 EPIPE (Broken pipe)
17:28:33.796647 recvmsg(14, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="", iov_len=2048}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 0
17:28:33.796967 epoll_ctl(4, EPOLL_CTL_DEL, 14, 0xbea59860) = 0
17:28:33.797210 close(14) = 0
communication with dbus seems to time-out. Please contact your distro for help, such integration issues must be handled by your distro.
I am the distro owner, but the last evidence seems that systemd is not able to manage requests from D-Bus
seems dbus is hanging or so for some reason and thus our requests time out.
If you are a distro maintainer, then try to figure out why dbus hangs and doesn't respond. It's the kind of thing a distro maintainer should be prepared to do.
Hi, unfortunately it seemed to me that D-Bus was the last to reply to the activation request and systemd receive it but do nothing, but if you are saying is not so, it's ok to me, I'm going to analyse D-Bus behaviour. I don't want to stole you more time than needed. Thank you again for your support, I'm going to close this ticket.
Ok, completed the git bisect process, the commit that brings my issue is this one: 1a4e392760cb582c6d18f1cf58c7ca234e5b815c So I reverted to v253 and removed -ftrivial-auto-var-init=zero from the meson.build and it worked flawlessly. I have to go in deep with this compiler parameter to get what's causing the issue.
With GCC 13.1 this starts to show up on x86_64 too. Removing -ftrivial-auto-var-init=zero
can also work around. Not sure if it's a GCC issue or systemd issue.
With GCC 13.1 this starts to show up on x86_64 too. Removing
-ftrivial-auto-var-init=zero
can also work around. Not sure if it's a GCC issue or systemd issue.
Finally, I'm not feeling alone :-) . I'm using a patch for -ftrivial-auto-var-init=zero removal in my automation and now I can build it on every platform I tried flawlessy.
BTW this happens even with clang 15 and 16
On x86_64 with GCC-13, even systemctl restart systemd-networkd
hangs. I can interrupt it with Ctrl + C
and the systemd-networkd is indeed restarted successfully, so it seems systemctl
fails to receive the bus message telling the restart has succeeded.
systemd version the issue has been seen with
252
Used distribution
Custom built O.S.
Linux kernel version used
6.1.10-v7l+
CPU architectures issue was seen on
arm/aarch64
Component
pam_systemd, systemctl, systemd-hostnamed, systemd-logind
Expected behaviour you didn't see
I am trying to automate a Linux from Scratch build for RaspberryPi, I successfully made it using systemd-251 and D-Bus-1.15.4 (even this is a development version) but works flawlessly with D-Bus-1.14.4. Now I recompiled all the packages but using systemd-252 (either with D-Bus-1.14.4 and 1.15.4) and here I got the problems: 1 - pam_systemd timeout to create sessions: pam_systemd(sshd:session): Failed to create session: Connection timed out 2 - every systemctl start/stop commands stay hung without exiting to prompt 3 - systemd-hostnamed does start but then killed after 90s Feb 10 17:12:20 lfs systemd[1]: Starting Hostname Service... Feb 10 17:13:50 lfs systemd[1]: systemd-hostnamed.service: start operation timed out. Terminating. Feb 10 17:13:50 lfs systemd[1]: systemd-hostnamed.service: Failed with result 'timeout'. Feb 10 17:13:50 lfs systemd[1]: Failed to start Hostname Service. 4 - same destiny to bluetooth and bluetooth-mesh services (but this is out scope here).
Following you can find the meson configuration command (used 251 as well as for 252): --buildtype=release -Db_ndebug=true -Db_pie=true -Db_lto=true --cross-file cross-rpi3 --prefix=/usr --libdir=/usr/lib/arm-linux-gnueabihf --includedir=/usr/include --sysconfdir=/etc --localstatedir=/var --datadir=/usr/share -Drootlibdir=/lib/arm-linux-gnueabihf -Dfirstboot=false -Dtests=false -Dslow-tests=false -Dfuzz-tests=false -Dinstall-tests=false -Dldconfig=false -Dsysusers=false -Drpmmacrosdir=no -Dhomed=false -Dsysupdate=false -Dnscd=false -Dmode=release -Dsplit-usr=true -Duserdb=false -Dservice-watchdog= -Dstandalone-binaries=true -Ddefault-kill-user-processes=false -Dfirst-boot-full-preset=true -Ddns-servers= -Ddefault-dnssec=no -Ddefault-dns-over-tls=no -Ddefault-mdns=no -Ddefault-llmnr=resolve -Dnss-myhostname=true -Dnss-mymachines=true -Dnss-resolve=true -Dnss-systemd=true -Doomd=true -Dblkid=true -Dman=auto -Ddbus=true -Dpam=true -Dpamlibdir=/usr/lib/arm-linux-gnueabihf/security -Dpamconfdir=/etc/pam.d -Ddbussystemservicedir=/usr/share/dbus-1/system-services -Ddbussessionservicedir=/usr/share/dbus-1/services -Ddbuspolicydir=/usr/share/dbus-1/system.d -Ddbus-interfaces-dir=/usr/share/dbus-1/interfaces -Dbashcompletiondir=/usr/share/bash-completion/completions/ -Defi-cflags=-lgcc_s -Defi-includedir=/home/desimonie/.bbxb/lfs/rpi3/binaries/usr/include/efi -Defi-libdir=/home/desimonie/.bbxb/lfs/rpi3/binaries/usr/lib/arm-linux-gnueabihf -Dquotaon-path=/usr/sbin/quotaon -Dquotacheck-path=/usr/sbin/quotacheck -Dkexec-path=/usr/sbin/kexec --default-library=shared
Built with gcc-12.2.
No relevant error founds in systems log, I tried to enable debug on those particular services and I don't see any relevant issues that helps to address me somewhere where to search. I'm quite sure that the issue is in the middle between systemd and D-Bus but I don't know how to debug the timeouts.
I gave a try with the 253-rc2 but the behaviour is the same as 252
Unexpected behaviour you saw
Steps to reproduce the problem
No response
Additional program output to the terminal or log subsystem illustrating the issue