Closed ghost closed 5 years ago
The HOWTO has steps describing how to use a Live CD to mount the pool for recovery. This works, as does recovery mode, to deal with issues in general, including this sort of thing specifically. I'm not particularly interested in trying to write recovery instructions for specific problems. In any event, once in a recovery environment, the steps for this are obvious: move the data out of the way, mount the dataset, move the data into it.
Trying to prevent problems as much as possible is a better goal. Using legacy mountpoints, as the HOWTO does now, for /var/{log,spool,tmp} helps a lot, and this will get better with the systemd mount generator in 0.8.0.
Can you be more specific about what is failing and why? Is there a particular daemon that writes to one of these locations that is missing a dependency on local-fs.target?
Absolutely, if you pass a list of commands you want the output for, I will put it here after sanitizing any personal information.
For now what I observed is:
-- Unit var-cache.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/spool...
-- Subject: Unit var-spool.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit var-spool.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /opt...
-- Subject: Unit opt.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit opt.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/lib/nfs...
-- Subject: Unit var-lib-nfs.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit var-lib-nfs.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Stopped Flush Journal to Persistent Storage.
-- Subject: Unit systemd-journal-flush.service has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit systemd-journal-flush.service has finished shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/log...
-- Subject: Unit var-log.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit var-log.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/tmp...
-- Subject: Unit var-tmp.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit var-tmp.mount has begun shutting down.
sep 18 06:27:47 X umount[8250]: umount: /var/log: target is busy.
sep 18 06:27:47 X systemd[1]: var-log.mount: Mount process exited, code=exited status=32
sep 18 06:27:47 X systemd[1]: Failed unmounting /var/log.
-- Subject: Unit var-log.mount has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
None of these are successfully mounted at boot, these are the options:
rpool/var mounted no -
rpool/var mountpoint /var inherited from rpool
rpool/var canmount off local
rpool/var/cache mounted no -
rpool/var/cache mountpoint /var/cache inherited from rpool
rpool/var/cache canmount on default
rpool/var/log mounted yes -
rpool/var/log mountpoint legacy local
rpool/var/log canmount on default
rpool/var/mail mounted no -
rpool/var/mail mountpoint /var/mail inherited from rpool
rpool/var/mail canmount on default
rpool/var/nfs mounted no -
rpool/var/nfs mountpoint /var/lib/nfs local
rpool/var/nfs canmount on default
rpool/var/spool mounted no -
rpool/var/spool mountpoint /var/spool inherited from rpool
rpool/var/spool canmount on default
rpool/var/tmp mounted yes -
rpool/var/tmp mountpoint legacy local
rpool/var/tmp canmount on default
Of coruse NFS and other services won't be happy. I also detected that the pools are no longer imported at boot (except for rpool).
I can understand your point about not providing specific fixes in the HOWTO, but this is something that happened quite often with earlier revisions and some of us still use that system. You owe nobody nothing, though. It is just nice to have (and I might provide a simple script to run on the recovery CD if time permits, I just think you might be more inclined to do so than me right now).
How would you prevent future problems from this? What settings should be applied to the dataset during recovery? And mods to /etc/fstab?
Thank you!
From the recovery environment (at this point I think building an ISO containing 0.8-rc3 and the necessary tools would be ideal, or any system ready to roll with ZFS and the latest revisions, otherwise you need to do a bunch of tricks to get the live server ISO to play nice with the modules and libraries):
root@ubuntu-server:~/zfs-0.8.0# zpool import -N -R /mnt/rpool/ rpool
root@ubuntu-server:~/zfs-0.8.0# zfs mount -a
cannot mount '/mnt/rpool//root':
cannot mount '/mnt/rpool//root': mount failed
cannot mount '/mnt/rpool//opt': directory is not empty
cannot mount '/mnt/rpool//var/cache':
cannot mount '/mnt/rpool//var/cache': mount failed
cannot mount '/mnt/rpool//var/lib/nfs': directory is not empty
I have setup three systems with different versions of the HOWTO, mostly following the same recipe, one with LUKS and RAID1. Unfortunately, there seems to be a recurring problem for some of the datasets in /var that creeps on these systems over time.
The symptoms are rather easy to spot:
These datasets are never mounted during boot, which depending on what is configured can be a PITA, for instance with NFS servers but also Samba.
This would be a trivial fix if it was not for the fact that the target mountpoints invariably end up containing data, therefore ZFS eventually refuses to mount them again. Since the files are often used on runtime by the OS, they cannot be 'hot relocated' to the ZFS datasets.
It would be great if @rlaager (who has done a great effort to detail the process for different configurations) or someone else added a short section on how to recover and fix this situation (regardless of what is needed, be it LiveCD based recovery or something else). The ideal scenario would be to do it on runtime in an affect system, or through an early boot script, that takes care of synchronizing the directories and fixing whatever is broken with the legacy mounts for such datasets. I don't know if the current steps fixed this, but I followed the HOWTO throughout 2018. I also know it is not an uncommon problem, as it pops up every now and then with the installations of other people.