A start job is running for Mount ZFS filesystems (time / no limit) and the server hangs forever

ezplanet commented 1 year ago

System information

Ubuntu Server 22.04.2 LTS with zfs	2.1.5
Distribution Name	Ubuntu
Distribution Version	22.04.2 LTS
Kernel Version	5.15.0-60-generic #66-Ubuntu SMP
Architecture
OpenZFS Version	2.1.5

zfs version zfs-2.1.5-1ubuntu6~22.04.1 zfs-kmod-2.1.5-1ubuntu6~22.04.1

Describe the problem you're observing

When a disk or array with a zfs pool is connected the server will NOT boot. It will hang at: "A start job is running for Mount ZFS filesystems (time / no limit)" indefinitely

Describe how to reproduce the problem

Install Ubuntu Server 22.04 LTS and import any existing zpool. Reboot. The system hangs as above (tested on 2 different installations, one a fresh install the other an upgrade from Ubuntu 20.04 LTS)

Include any warning/errors/backtraces from the system logs

No relevant logs available. The servers must be hard reset and the hardware including the zfs pool must be disconnected before rebooting to regain control (testing with headless servers).

After further investigation I found that this service is hanging indefinitely on boot whenever a zfs pool is present on any connected drive:

/lib/systemd/system/zfs-mount.service

[Unit]
Description=Mount ZFS filesystems
Documentation=man:zfs(8)
DefaultDependencies=no
After=systemd-udev-settle.service
After=zfs-import.target
After=systemd-remount-fs.service
After=zfs-load-module.service
Before=local-fs.target
ConditionPathIsDirectory=/sys/module/zfs

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs mount -a

[Install]
WantedBy=zfs.target

I tried to disable this service but it appears to be running on boot regardless, even if it is marked as "disabled" by "systemd".

By adding the following to the [Service] section: "TimeoutSec=60" the issue is worked around and the servers will boot, however this change is overwritten whenever there are new zfs packages updates. When the server boots after the timeout the zpool previously imported are found imported and all filesystems are present and mounted (tested only with regular zfs filesystems). I wonder what this zfs-mount.service does.

NOTE: if the zpool drives are physically connected AFTER the server is up and running, then a zpool import gets access to the pool without issues (until a reboot with the physical drives still connected).

sxc731 commented 1 year ago

FWIW, I've not experienced this issue on either my laptop or server, both now running Ubuntu 22.04.2 with zfs-2.1.5-1ubuntu6~22.04.1. Both have been upgraded numerous times (not fresh installs).

Admittedly, both these machines are booted in a somewhat unusual way (by Ubuntu standards); they're both running ZFS-on-root and are booted through ZFS Boot Menu and dracut (rather than GRUB and initramfs-tools).

arudnev commented 1 year ago

We are experiencing similar issue. Noticed it a few months ago, on an older kernel, but could not reproduce consistently. Currently experiencing it on 5.19.0-1029-aws #30~22.04.1-Ubuntu SMP with the following zfs related packages:

libzfs4linux 2.1.5-1ubuntu6~22.04.1
zfs-initramfs 2.1.5-1ubuntu6~22.04.1
zfs-zed 2.1.5-1ubuntu6~22.04.1
zfsutils-linux 2.1.5-1ubuntu6~22.04.1

What we discovered today is that probability of running into this issue heavily depends on the instance type. We are running on AWS and this never happens on instances like r5.large / r5.xlarge, very rarely happens on t3.large (i.e. just repeat the following in a loop to test - ssh and sudo reboot, wait 90 seconds), but usually does not survive one or two reboot in such loop on t3.medium and smaller instances. After instance stop + force-stop it sometime comes back on those smaller instances, sometimes does not. Switching to larger instances after force-stop addresses the issue without rebuilding the box. We have a bit fancy setup of boxes where we observed this issue, with btrfs as a root file system and docker running some containers on top of zfs, plus various additional services like mdatp and amazon ssn that create additional load, so might need to go to smaller instances, like t3.small and bellow to consistently reproduce it with most basic ubuntu configuration. Also, we tried switching between zfs-import-cache and zfs-import-scan, getting similar results - it hangs forever while trying to import zpools on smaller instances (not always, just sometime), works fine on larger ones.

The end of serial console when it gets stuck looks like this (this is in case of zfs-import-scan being active, zfs-import-cache being disabled):

[  OK  ] Finished Wait for udev To Complete Device Initialization.
[  OK  ] Finished File System Check on /dev/nvme0n1p2.
[  OK  ] Mounted /var/cache.
[  OK  ] Mounted /var/lib/docker.
[  OK  ] Mounted /var/log.
[  OK  ] Mounted /var/snap.
[  OK  ] Mounted /var/spool.
[  OK  ] Mounted /var/tmp.
[  OK  ] Finished Load/Save Random Seed.
         Mounting /boot...
         Mounting /var/log/audit...
[  OK  ] Started File System Check Daemon to report status.
         Starting Flush Journal to Persistent Storage...
         Starting Import ZFS pools by device scanning...
         Starting Install ZFS kernel module...
[  OK  ] Mounted /boot.
[  OK  ] Mounted /var/log/audit.
[  OK  ] Finished Flush Journal to Persistent Storage.
[  OK  ] Finished Install ZFS kernel module.
[   15.667677] cloud-init[486]: Cloud-init v. 23.2.1-0ubuntu0~22.04.1 running 'init-local' at Tue, 25 Jul 2023 22:53:27 +0000. Up 15.63 seconds.
[  OK  ] Finished Initial cloud-init job (pre-networking).
[**    ] A start job is running for Import Z…ice scanning (3min 52s / no limit)
[  248.582260] INFO: task zpool:451 blocked for more than 120 seconds.
[  248.586324]       Tainted: P           O      5.19.0-1029-aws #30~22.04.1-Ubuntu
[  248.612408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.621722] INFO: task z_null_int:462 blocked for more than 120 seconds.
[  248.626207]       Tainted: P           O      5.19.0-1029-aws #30~22.04.1-Ubuntu
[***   ] A start job is running for Import Z…ice scanning (5min 53s / no limit)

and when it works it usually takes 1-2 seconds finish the import targets:

[  OK  ] Finished Wait for udev To Complete Device Initialization.
[  OK  ] Mounted /var/cache.
[  OK  ] Mounted /var/lib/docker.
[  OK  ] Finished File System Check on /dev/nvme0n1p2.
[  OK  ] Mounted /var/log.
[  OK  ] Mounted /var/snap.
[  OK  ] Mounted /var/spool.
[  OK  ] Mounted /var/tmp.
[  OK  ] Finished Load/Save Random Seed.
         Mounting /boot...
         Mounting /var/log/audit...
[  OK  ] Started File System Check Daemon to report status.
         Starting Flush Journal to Persistent Storage...
         Starting Import ZFS pools by device scanning...
         Starting Install ZFS kernel module...
[  OK  ] Mounted /boot.
[  OK  ] Mounted /var/log/audit.
[  OK  ] Finished Install ZFS kernel module.
[  OK  ] Finished Flush Journal to Persistent Storage.
[   14.931178] cloud-init[528]: Cloud-init v. 23.2.1-0ubuntu0~22.04.1 running 'init-local' at Tue, 25 Jul 2023 22:51:32 +0000. Up 14.90 seconds.
[  OK  ] Finished Initial cloud-init job (pre-networking).
[  OK  ] Finished Import ZFS pools by device scanning.
[  OK  ] Reached target ZFS pool import target.
         Starting Mount ZFS filesystems...
         Starting Wait for ZFS Volume (zvol) links in /dev...
[  OK  ] Finished Wait for ZFS Volume (zvol) links in /dev.
[  OK  ] Reached target ZFS volumes are ready.
[  OK  ] Finished Mount ZFS filesystems.
[  OK  ] Reached target Mounting snaps.

If time permits we'll try to reproduce on most basic Ubuntu LTS install, but if some of the maintainers want to try it might be worth looking into instances like t3.small or lower.

arudnev commented 1 year ago

We just launched a t3.small based instance on aws using latest ubuntu 22.04 available from the ami launcher - amazon/ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230516. It has ebs backed root volume (/dev/sda1) and second ebs volume (/dev/xvdh) is used for zpool. After initial update and package install instance was rebooted, then zpool got created. On first sudo reboot after that it hanged on zfs import, then after force-stop + start it booted ok, then booted fine again after sudo reboot, then hanged on zfs import again on second attempt of sudo reboot.

Here are the the commands that got executed to set it up:

ssh ubuntu@ubuntu-zfs-test

sudo apt update -y && sudo apt upgrade -y
sudo reboot

ssh ubuntu@ubuntu-zfs-test

sudo apt install -y zfsutils-linux

lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 24.4M 1 loop /snap/amazon-ssm-agent/6312
loop1 7:1 0 55.6M 1 loop /snap/core18/2745
loop2 7:2 0 53.2M 1 loop /snap/snapd/19122
loop3 7:3 0 111.9M 1 loop /snap/lxd/24322
loop4 7:4 0 63.3M 1 loop /snap/core20/1879
nvme1n1 259:0 0 8G 0 disk
├─nvme1n1p1 259:1 0 8G 0 part
└─nvme1n1p9 259:2 0 8M 0 part
nvme0n1 259:3 0 8G 0 disk
├─nvme0n1p1 259:4 0 7.9G 0 part /
├─nvme0n1p14 259:5 0 4M 0 part
└─nvme0n1p15 259:6 0 106M 0 part /boot/efi

sudo zpool create \
-o autoexpand=on \
-O canmount=off \
-O compression=lz4 \
-O atime=off \
-O normalization=formD \
-O relatime=on \
-O xattr=sa \
-m none zpool-docker \
$(ls -l /dev/disk/by-id | grep nvme1n1 | awk '{print $9}' | grep -E 'nvme-Amazon_Elastic_Block_Store_vol[[:alnum:]]+$')

sudo zpool list

NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zpool-docker  7.50G   159K  7.50G        -         -     0%     0%  1.00x    ONLINE  -

And then we get this sequence when we issue sudo reboot to reboot the box and it hangs in some of those boots just reporting those "A start job is running for Mount ZFS filesystems (time / no limit)" messages in serial console:

reboot #1 -> hangs
stop + start -> ok
reboot # 2 -> ok
reboot # 3 -> hangs

mxdblcf commented 11 months ago

I have also encountered this situation, I guess the NVME device does dedup and cache, but I don't know why zfs can't mount this NVME device when I restart it

ezplanet commented 10 months ago

This is still happening, is there any plan for a resolution? My workaround is to delete /lib/systemd/system/zfs-mount.service because disabling it is not enough as it gets executed regardless

AllKind commented 10 months ago

@ezplanet Does systemctl mask zfs-mount.service do the trick? Essentially the same thing, but it creates a symlink to /dev/null, but with the advantage it stays that way on re-install / update.

mxdblcf commented 10 months ago

The pool use read-only mode can read data. Then export data.

foxpalace commented 5 months ago

If zfs is defekt you can never boot in your system, thats right, but is it ok, that i never can repair or change the disk to repair the system - all settings are gone while only /var/vmail ist gone - it can't be / because we never came to the startblock

spicysomtam commented 3 months ago

Still an issue in 2024. I am using Mint 21.3. I was playing round with Charmed Kubernetes, which was wiped after I had finished trying it out (it uses lxc containers). However next boot an issue even though I have no zfs filesystems (just ext4). I masked out the zfs-mount.service. Afterwards I ran zfs mount -a; no issue. Its pretty bad form that it never times out; does it assume you have zfs filesystems that might take forever to fsck/check?

fpga-guy commented 4 weeks ago

I was facing the same issue after having to reconstruct the boot pool. I found the solution in my case was to edit the /lib/systemd/system/zfs-import-cache.service file and add the TimeoutSec=60 attribute in the [Service] section. Doing that in the zfs-mount.service had no effect. The boot did not complete though and I was dumped into a maintenance shell. Doing a reboot there did the trick and the system came back whole again.

ezplanet commented 12 hours ago

@ezplanet Does systemctl mask zfs-mount.service do the trick? Essentially the same thing, but it creates a symlink to /dev/null, but with the advantage it stays that way on re-install / update.

No it does not, I worked around it including TimeoutSec as follows:

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zfs mount -a
TimeoutSec=20

However this is still quite a big issue because when zfs-mount receives an automated ubuntu package update the change is wiped out and the server hangs at the next reboot. Since I use headless servers it means getting them out of the rack and connect keyboard and monitor (which I do not have normally plugged) to regain control. I tried also a re-install from scratch on a new server and I end up exactly with the same issue. I do not use root zfs. I use external USB 3.1 attached JBODs with separate zil and cache on internal SSD partitions:

 pool: terra
 state: ONLINE
  scan: scrub repaired 0B in 13:08:18 with 0 errors on Sun Sep 15 13:32:30 2024
config:

    NAME             STATE     READ WRITE CKSUM
    terra            ONLINE       0     0     0
      raidz1-0       ONLINE       0     0     0
        terra-slot1  ONLINE       0     0     0
        terra-slot2  ONLINE       0     0     0
        terra-slot3  ONLINE       0     0     0
        terra-slot4  ONLINE       0     0     0
        terra-slot5  ONLINE       0     0     0
    logs
      terraZil       ONLINE       0     0     0
    cache
      terraCache     ONLINE       0     0     0

errors: No known data errors

Is anyone looking into the root cause? Do you need any help with debugging?

AllKind commented 8 hours ago

Use systemctl edit UNIT to create an override config, which will survive system updates.

openzfs / zfs