openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.64k stars 1.75k forks source link

Improvements for the Ubuntu HOWTO (re: legacy mounts) @rlaager #8620

Closed ghost closed 5 years ago

ghost commented 5 years ago

I have setup three systems with different versions of the HOWTO, mostly following the same recipe, one with LUKS and RAID1. Unfortunately, there seems to be a recurring problem for some of the datasets in /var that creeps on these systems over time.

The symptoms are rather easy to spot:

rpool/home                                        mounted   no       -
rpool/home/deluged                                mounted   no       -
rpool/home/root                                   mounted   no       -
rpool/home/dprk                            mounted   no       -
rpool/opt                                         mounted   no       -
rpool/srv                                         mounted   no       -
rpool/var                                         mounted   no       -
rpool/var/cache                                   mounted   no       -
rpool/var/mail                                    mounted   no       -
rpool/var/nfs                                     mounted   no       -
rpool/var/spool                                   mounted   no       -
(mounted=no)

These datasets are never mounted during boot, which depending on what is configured can be a PITA, for instance with NFS servers but also Samba.

This would be a trivial fix if it was not for the fact that the target mountpoints invariably end up containing data, therefore ZFS eventually refuses to mount them again. Since the files are often used on runtime by the OS, they cannot be 'hot relocated' to the ZFS datasets.

It would be great if @rlaager (who has done a great effort to detail the process for different configurations) or someone else added a short section on how to recover and fix this situation (regardless of what is needed, be it LiveCD based recovery or something else). The ideal scenario would be to do it on runtime in an affect system, or through an early boot script, that takes care of synchronizing the directories and fixing whatever is broken with the legacy mounts for such datasets. I don't know if the current steps fixed this, but I followed the HOWTO throughout 2018. I also know it is not an uncommon problem, as it pops up every now and then with the installations of other people.

rlaager commented 5 years ago

The HOWTO has steps describing how to use a Live CD to mount the pool for recovery. This works, as does recovery mode, to deal with issues in general, including this sort of thing specifically. I'm not particularly interested in trying to write recovery instructions for specific problems. In any event, once in a recovery environment, the steps for this are obvious: move the data out of the way, mount the dataset, move the data into it.

Trying to prevent problems as much as possible is a better goal. Using legacy mountpoints, as the HOWTO does now, for /var/{log,spool,tmp} helps a lot, and this will get better with the systemd mount generator in 0.8.0.

Can you be more specific about what is failing and why? Is there a particular daemon that writes to one of these locations that is missing a dependency on local-fs.target?

ghost commented 5 years ago

Absolutely, if you pass a list of commands you want the output for, I will put it here after sanitizing any personal information.

For now what I observed is:


-- Unit var-cache.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/spool...
-- Subject: Unit var-spool.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Unit var-spool.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /opt...
-- Subject: Unit opt.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Unit opt.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/lib/nfs...
-- Subject: Unit var-lib-nfs.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Unit var-lib-nfs.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Stopped Flush Journal to Persistent Storage.
-- Subject: Unit systemd-journal-flush.service has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Unit systemd-journal-flush.service has finished shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/log...
-- Subject: Unit var-log.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Unit var-log.mount has begun shutting down.
sep 18 06:27:47 X systemd[1]: Unmounting /var/tmp...
-- Subject: Unit var-tmp.mount has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
-- 
-- Unit var-tmp.mount has begun shutting down.
sep 18 06:27:47 X umount[8250]: umount: /var/log: target is busy.
sep 18 06:27:47 X systemd[1]: var-log.mount: Mount process exited, code=exited status=32
sep 18 06:27:47 X systemd[1]: Failed unmounting /var/log.
-- Subject: Unit var-log.mount has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support

None of these are successfully mounted at boot, these are the options:

rpool/var        mounted                no                     -
rpool/var        mountpoint             /var                   inherited from rpool
rpool/var        canmount               off                    local
rpool/var/cache  mounted                no                     -
rpool/var/cache  mountpoint             /var/cache             inherited from rpool
rpool/var/cache  canmount               on                     default
rpool/var/log    mounted                yes                    -
rpool/var/log    mountpoint             legacy                 local
rpool/var/log    canmount               on                     default
rpool/var/mail   mounted                no                     -
rpool/var/mail   mountpoint             /var/mail              inherited from rpool
rpool/var/mail   canmount               on                     default
rpool/var/nfs    mounted                no                     -
rpool/var/nfs    mountpoint             /var/lib/nfs           local
rpool/var/nfs    canmount               on                     default
rpool/var/spool  mounted                no                     -
rpool/var/spool  mountpoint             /var/spool             inherited from rpool
rpool/var/spool  canmount               on                     default
rpool/var/tmp    mounted                yes                    -
rpool/var/tmp    mountpoint             legacy                 local
rpool/var/tmp    canmount               on                     default

Of coruse NFS and other services won't be happy. I also detected that the pools are no longer imported at boot (except for rpool).

I can understand your point about not providing specific fixes in the HOWTO, but this is something that happened quite often with earlier revisions and some of us still use that system. You owe nobody nothing, though. It is just nice to have (and I might provide a simple script to run on the recovery CD if time permits, I just think you might be more inclined to do so than me right now).

How would you prevent future problems from this? What settings should be applied to the dataset during recovery? And mods to /etc/fstab?

Thank you!

ghost commented 5 years ago

From the recovery environment (at this point I think building an ISO containing 0.8-rc3 and the necessary tools would be ideal, or any system ready to roll with ZFS and the latest revisions, otherwise you need to do a bunch of tricks to get the live server ISO to play nice with the modules and libraries):

root@ubuntu-server:~/zfs-0.8.0# zpool import -N -R /mnt/rpool/ rpool
root@ubuntu-server:~/zfs-0.8.0# zfs mount -a
cannot mount '/mnt/rpool//root': 
cannot mount '/mnt/rpool//root': mount failed
cannot mount '/mnt/rpool//opt': directory is not empty
cannot mount '/mnt/rpool//var/cache': 
cannot mount '/mnt/rpool//var/cache': mount failed
cannot mount '/mnt/rpool//var/lib/nfs': directory is not empty