graysky2 / profile-sync-daemon

Symlinks and syncs browser profile dirs to RAM thus reducing HDD/SDD calls and speeding-up browsers.
https://wiki.archlinux.org/index.php/Profile-sync-daemon
Other
897 stars 84 forks source link

Stopping psd on shutdown could get timeout and killed #352

Open felixonmars opened 1 year ago

felixonmars commented 1 year ago
May 03 11:24:54 flygon.felixc.at systemd[1820]: Stopping Profile-sync-daemon...
May 03 11:26:24 flygon.felixc.at systemd[1820]: psd.service: Stopping timed out. Terminating.
May 03 11:26:24 flygon.felixc.at systemd[1820]: psd.service: Control process exited, code=killed, status=15/TERM
May 03 11:26:24 flygon.felixc.at profile-sync-daemon[973086]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(713)>
May 03 11:26:25 flygon.felixc.at profile-sync-daemon[973087]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(713)>
May 03 11:26:25 flygon.felixc.at systemd[1820]: psd.service: Failed with result 'timeout'.
May 03 11:26:25 flygon.felixc.at systemd[1820]: Stopped Profile-sync-daemon.

This leaves PSD in ungrateful state. Probably TimeoutStopSec should be set for psd.service with a higher value.

graysky2 commented 1 year ago

Have you tried experimenting with a TimeoutStopSec=60 or something similar? Does it solve the issue?

felixonmars commented 1 year ago

Not yet because I have more issues with PSD (firefox localstorage completely broken with PSD on, etc), unfortunately :(

gma commented 7 months ago

I've got a related problem, but different log output. I've got psd managing two Firefox profiles, Chromium, and Brave. I only really use Brave, so sync'ing its data takes significantly longer.

I usually experience data loss (i.e. Brave forgets open tabs and windows, etc) if I reboot the system without quitting Brave manually first, waiting a few seconds, then logging out of my desktop session.

I just had to login remotely and reboot the machine after my desktop environment went into an infinite loop.

Here are a few pertinent lines from my journalctl output:

Dec 05 12:32:02 dilaton sudo[2886391]:   graham : TTY=pts/1 ; PWD=/home/graham ; USER=root ; COMMAND=/usr/sbin/reboot

Dec 05 12:32:04 dilaton profile-sync-daemon[2886783]: firefox resync successful
Dec 05 12:32:05 dilaton profile-sync-daemon[2886783]: chromium resync successful
Dec 05 12:32:06 dilaton profile-sync-daemon[2886783]: brave resync successful
Dec 05 12:32:07 dilaton sudo[2887065]:   graham : PWD=/home/graham ; USER=root ; COMMAND=/usr/bin/psd-overlay-helper -d /run/u>
Dec 05 12:32:07 dilaton sudo[2887065]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Dec 05 12:32:07 dilaton systemd[1]: run-user-1000-graham\x2dfirefox\x2d45e08wmw.default.mount: Deactivated successfully.
Dec 05 12:32:07 dilaton systemd[1]: Unmounted /run/user/1000/graham-firefox-45e08wmw.default.
Dec 05 12:32:07 dilaton sudo[2887065]: pam_unix(sudo:session): session closed for user root
Dec 05 12:32:07 dilaton profile-sync-daemon[2886783]: firefox unsync successful
Dec 05 12:32:07 dilaton sudo[2887084]:   graham : PWD=/home/graham ; USER=root ; COMMAND=/usr/bin/psd-overlay-helper -d /run/u>
Dec 05 12:32:07 dilaton sudo[2887084]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Dec 05 12:32:07 dilaton systemd[1]: run-user-1000-graham\x2dfirefox\x2dbtxrhgoe.default\x2drelease.mount: Deactivated successf>
Dec 05 12:32:07 dilaton systemd[1]: Unmounted /run/user/1000/graham-firefox-btxrhgoe.default-release.
Dec 05 12:32:07 dilaton sudo[2887084]: pam_unix(sudo:session): session closed for user root
Dec 05 12:32:08 dilaton profile-sync-daemon[2887087]: rm: cannot remove '/run/user/1000/.graham-firefox-btxrhgoe.default-relea>
Dec 05 12:32:08 dilaton profile-sync-daemon[2886783]: firefox unsync successful
Dec 05 12:32:08 dilaton sudo[2887102]:   graham : PWD=/home/graham ; USER=root ; COMMAND=/usr/bin/psd-overlay-helper -d /run/u>
Dec 05 12:32:08 dilaton sudo[2887102]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Dec 05 12:32:08 dilaton systemd[1]: run-user-1000-graham\x2dchromium.mount: Deactivated successfully.
Dec 05 12:32:08 dilaton systemd[1]: Unmounted /run/user/1000/graham-chromium.
Dec 05 12:32:08 dilaton sudo[2887102]: pam_unix(sudo:session): session closed for user root
Dec 05 12:32:08 dilaton profile-sync-daemon[2886783]: chromium unsync successful

Dec 05 12:32:09 dilaton systemd[1]: user@1000.service: State 'stop-sigterm' timed out. Killing.
Dec 05 12:32:09 dilaton systemd[1]: user@1000.service: Killing process 2886783 (profile-sync-da) with signal SIGKILL.
Dec 05 12:32:09 dilaton systemd[1]: user@1000.service: Main process exited, code=killed, status=9/KILL
Dec 05 12:32:09 dilaton systemd[1]: user@1000.service: Failed with result 'timeout'.
Dec 05 12:32:09 dilaton systemd[1]: Stopped User Manager for UID 1000.

Dec 05 12:32:09 dilaton systemd[1]: Stopped User Manager for UID 1000.
Dec 05 12:32:09 dilaton systemd[1]: user@1000.service: Consumed 1w 1d 2h 35min 10.135s CPU time.
Dec 05 12:32:09 dilaton systemd[1]: Stopping User Runtime Directory /run/user/1000...
Dec 05 12:32:09 dilaton systemd[1]: run-user-1000-graham\x2dbrave.mount: Deactivated successfully.
Dec 05 12:32:09 dilaton systemd[1]: Unmounted /run/user/1000/graham-brave.
Dec 05 12:32:09 dilaton systemd[1]: run-user-1000.mount: Deactivated successfully.
Dec 05 12:32:09 dilaton systemd[1]: Unmounted /run/user/1000.
Dec 05 12:32:09 dilaton systemd[1]: user-runtime-dir@1000.service: Deactivated successfully.
Dec 05 12:32:09 dilaton systemd[1]: Stopped User Runtime Directory /run/user/1000.

I trimmed out a bunch of unrelated stuff (e.g. a few other services that hung around a bit too long got SIGKILL'd too), and added some blank lines around snippets of logs that belong together.

Notice that:

  1. All browsers except Brave have a "unsync successful" log message, and
  2. "profile-sync-da" (process 2886783) got killed after the attempt to SIGTERM all my user space stuff timed out (the penultimate block of log messages).

What's the best way to fix this? I looked into TimeoutStopSec that was mentioned above, and see that it's not configured at all my in psd.service file.

The value for DefaultTimeoutStopSec in my /etc/systemd/system.conf and user.conf files is 90s.

My reading of the systemd.service(5) man page suggests that my system should be waiting up to 90 seconds during shutdown for things to shutdown cleanly. So TimeoutStopSec for the psd service doesn't seem related.

Perhaps this is the culprit?

cat /usr/lib/systemd/system/user@.service.d/timeout.conf 
# Avoid long hangs during shutdown if user services fail/hang due to X.org
# going away too early
[Service]
TimeoutStopSec=5

I strongly suspect that the solution (for me at least) will be to override TimeoutStopSec=5 for the user@.service service. For those unfamiliar, 1000 is my user id, and systemd is starting the user@.service as my user. That's why it's showing up in the log output as user@1000, but the timeout.conf file I just cat'd is in a directory that doesn't contain "1000" in the name.

Once I've worked out a clean way to "fix" this I might pop back with the relevant command...

For anyone wondering, I'm on Pop!_OS 22.04. To all extents and purposes, I think that's equivalent to me being on Ubuntu in terms of how we can expect this area of the system to behave.

gma commented 7 months ago

I've worked around this on my Pop!_OS (i.e. Ubuntu-like) machines like this:

sudo mkdir /etc/systemd/system/user@.service.d
sudo tee -a /etc/systemd/system/user@.service.d/timeout.conf <<EOF
[Service]
TimeoutStopSec=30s
EOF

To get the new setting picked up run sudo systemctl daemon-reload.

To check that it's been detected correctly, use systemctl show:

systemctl show user@$(id -u) | grep TimeoutStop
TimeoutStopUSec=30s
TimeoutStopFailureMode=terminate

You should see the new value from your config file is now assigned to TimeoutStopUSec. Yes, it says USec not Sec on the end; that's just how systemd stores these values internally.

From what I've learned this evening, I think you could argue that this is a bug in the operating system. I don't think the OS has any right to set the timeout before it sends SIGKILL to user-level services to just 5 seconds, when the upstream (systemd) default is 2 minutes.

I assume it's present in Ubuntu as well as Pop!_OS, but I don't know. But as such, I wonder if it's worth documenting this problem fairly prominently in the README?

I've been putting up with intermittently (and unpredictably) losing all my browser state (with my open tabs reverting to whatever I was reading a couple of months ago) for at least two years now, before deciding to dig in and find out what was going wrong. I can imagine most people might just blame psd and stop using it (I was very close to doing that myself this evening).