linuxmint / timeshift

System restore tool for Linux. Creates filesystem snapshots using rsync+hardlinks, or BTRFS snapshots. Supports scheduled snapshots, multiple backup levels, and exclude filters. Snapshots can be restored while system is running or from Live CD/USB.
2.44k stars 88 forks source link

TS very slow or non-functional on Linux Terminal Server #144

Open aknisly opened 1 year ago

aknisly commented 1 year ago

We are running a Linux Mint (20.2) terminal server with up to 30 concurrent users via Cendio ThinLinc. I've noticed before that at times TS is very sluggish in starting up, but when I was needing to do a restore today (thanks to running LibreOffice as root...don't know why, but it hosed the app for all regular users.), it took >10 minutes for the GUI to pop up. If I tried running it from the command line, it took just as long. I initially was able to create a snapshot from the GUI, but it always hung on the first dialog screen on a restore. Running it from CLI was similar, except that it failed outright, giving me this:

E: Failed to mount device '/dev/dm-1' at mount point '/run/timeshift/1790042/restore/backup/NFS'
[15:55:21] E: mount: /run/timeshift/1790042/restore/backup/NFS: /dev/mapper/vg_backups-lv_backups already mounted or mount point busy.

ThinLinc uses a mechanism to access local drives/directories on the thin clients, and I suspected the problem lay somewhere with TS attempting to mount phantom removable devices that were mapped somewhere but not present. I finally noticed this in the log:

[15:55:21] mount_target_device()
[15:55:21] unmount_target_device()
[15:55:21] Device: get_mounted_filesystems_using_mtab(): 3
[15:55:21] Device: get_mounted_filesystems_using_mtab(): 3
[15:55:21] ------------------
[15:55:21] arg=3a92297f-807f-43b8-9585-e4fdefd5315d, device=/dev/sda1
[15:55:21] /
[15:55:21] ------------------
[15:55:21] Device: get_mounted_filesystems_using_mtab(): 3
[15:55:21] Mounted '/dev/sda1' at '/run/timeshift/1790042/restore/'
[15:55:21] Device: get_mounted_filesystems_using_mtab(): 3
[15:55:21] ------------------
[15:55:21] arg=42132aed-c0ee-4382-bb5d-2fb35069b5f9, device=/dev/dm-1
[15:55:21] ------------------
[15:55:21] Device: get_mounted_filesystems_using_mtab(): 3

I checked /etc/mtab; there's plenty there I don't understand. But I compared it with an earlier version, and there was no significant difference, so I doubt I'm barking up the right tree. I didn't bother to try to unmount stuff; I just rebooted. After the reboot, the TS gui pulled up in seconds. Everything worked as it should. So was it a phantom mount issue?

Lesson learned, but if anybody has any insight, I'd be grateful. I'm sure this is an edge case, but I'm recording it here in case anyone else finds it useful. TimeShift version: 20.11.1, then upgraded to 22.06.5, with the same results.

uname -a
Linux pc 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

2023-02-09_15-46-19_restore.log

aknisly commented 1 year ago

Here is a comment from Martin at Cendio:

Most probably, timeshift is poking on the mounted filsystems on your 
server, and stalls when it is trying to probe a disconnect NFS mount.

ThinLinc local drive is an NFS mount, where your ThinLinc server is the 
NFS client and the ThinLinc client is the NFS server.

It looks like timemachine hangs for about 5 minutes at:
[15:46:19] update_partitions() before it proceeds.

The delay he refers to is actually close to 9 minutes. I had overlooked that. I checked more logs, and each one was hanging at this point for the same amount of time.

I'm also remembering that I was getting an atypical delay running df on this system. That has cleared up since the reboot as well.

aknisly commented 1 year ago

OK, it's apparent that nobody sees this as a TimeShift issue per se, and I'm not disagreeing--although I would note that df is working as expected, so TS is the only utility affected by this, as far as I can tell. Whatever the case, can someone point me in a plausible direction for troubleshooting? I've been tracking this a bit, and the hang time at the _updatepartitions grows slowly, but fairly consistently over time. In less than 5 days, it's back up to over 7 minutes.