openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.44k stars 1.73k forks source link

zpool import failure by systemd at boot #11587

Open stuartthebruce opened 3 years ago

stuartthebruce commented 3 years ago

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 8.3
Linux Kernel 4.18.0-240.10.1.el8_3
Architecture x86_64
ZFS Version 2.0.2
SPL Version 2.0.2

Describe the problem you're observing

zfs-import-cache.service fails to import pool with 60 HDD from an external SAS JBOD at boot time.

Describe how to reproduce the problem

/sbin/shutdown -r now

Include any warning/errors/backtraces from the system logs

[root@gwosc-zfs1 ~]# zpool list
NAME          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
gwosc-fast   63.5G   652M  62.9G        -         -     0%     1%  1.00x    ONLINE  -
gwosc-jbod5   658T  78.1T   580T        -         -     0%    11%  1.00x    ONLINE  -

[root@gwosc-zfs1 ~]# /sbin/shutdown -r now

...

[root@gwosc-zfs1 ~]# systemctl status zfs-import-cache.service
● zfs-import-cache.service - Import ZFS pools by cache file
   Loaded: loaded (/usr/lib/systemd/system/zfs-import-cache.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2021-02-09 13:27:12 PST; 4min 14s ago
     Docs: man:zpool(8)
  Process: 1857 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN (code=exited, status=1/FAILURE)
 Main PID: 1857 (code=exited, status=1/FAILURE)

Feb 09 13:27:09 gwosc-zfs1 systemd[1]: Starting Import ZFS pools by cache file...
Feb 09 13:27:12 gwosc-zfs1 zpool[1857]: cannot import 'gwosc-jbod5': no such pool or dataset
Feb 09 13:27:12 gwosc-zfs1 zpool[1857]:         Destroy and re-create the pool from
Feb 09 13:27:12 gwosc-zfs1 zpool[1857]:         a backup source.
Feb 09 13:27:12 gwosc-zfs1 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Feb 09 13:27:12 gwosc-zfs1 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Feb 09 13:27:12 gwosc-zfs1 systemd[1]: Failed to start Import ZFS pools by cache file.

[root@gwosc-zfs1 ~]# zpool list
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
gwosc-fast  63.5G   652M  62.9G        -         -     0%     1%  1.00x    ONLINE  -

[root@gwosc-zfs1 ~]# zpool import gwosc-jbod5

[root@gwosc-zfs1 ~]# zpool list
NAME          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
gwosc-fast   63.5G   652M  62.9G        -         -     0%     1%  1.00x    ONLINE  -
gwosc-jbod5   658T  78.1T   580T        -         -     0%    11%  1.00x    ONLINE  -

The zfs-import-cache failure appears to be a race condition with the kernel discovering/registering all of the HDD, e.g., from syslog,

Feb  9 13:27:12 gwosc-zfs1 kernel: end_device-4:2:7: add: handle(0x0022), sas_addr(0x5000cca291992946)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:10:0: Direct-Access     HGST     HUH721212AL5200  A925 PQ: 0 ANSI: 6
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:10:0: SSP: handle(0x0023), sas_addr(0x5000cca29199fe8e), phy(8), device_name(0x5000cca29199fe8f)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:10:0: enclosure logical id (0x5000ccab0405b000), slot(17)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:10:0: enclosure level(0x0000), connector name(     )
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:10:0: qdepth(254), tagged(1), scsi_level(7), cmd_que(1)
Feb  9 13:27:12 gwosc-zfs1 zpool[1857]: cannot import 'gwosc-jbod5': no such pool or dataset
Feb  9 13:27:12 gwosc-zfs1 zpool[1857]: #011Destroy and re-create the pool from
Feb  9 13:27:12 gwosc-zfs1 zpool[1857]: #011a backup source.
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Failed to start Import ZFS pools by cache file.
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Reached target ZFS pool import target.
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Starting Mount ZFS filesystems...
Feb  9 13:27:12 gwosc-zfs1 kernel: end_device-4:2:8: add: handle(0x0023), sas_addr(0x5000cca29199fe8e)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:11:0: Direct-Access     HGST     HUH721212AL5200  A925 PQ: 0 ANSI: 6
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:11:0: SSP: handle(0x0024), sas_addr(0x5000cca291995a4a), phy(9), device_name(0x5000cca291995a4b)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:11:0: enclosure logical id (0x5000ccab0405b000), slot(18)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:11:0: enclosure level(0x0000), connector name(     )
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:11:0: qdepth(254), tagged(1), scsi_level(7), cmd_que(1)
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Started Mount ZFS filesystems.
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Reached target Local File Systems.
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Starting Import network configuration from initramfs...
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Starting Restore /run/initramfs on shutdown...
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Started Restore /run/initramfs on shutdown.
Feb  9 13:27:12 gwosc-zfs1 kernel: end_device-4:2:9: add: handle(0x0024), sas_addr(0x5000cca291995a4a)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:12:0: Direct-Access     HGST     HUH721212AL5200  A925 PQ: 0 ANSI: 6
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:12:0: SSP: handle(0x0025), sas_addr(0x5000cca291990c7e), phy(10), device_name(0x5000cca291990c7f)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:12:0: enclosure logical id (0x5000ccab0405b000), slot(19)
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:12:0: enclosure level(0x0000), connector name(     )
Feb  9 13:27:12 gwosc-zfs1 kernel: scsi 4:0:12:0: qdepth(254), tagged(1), scsi_level(7), cmd_que(1)
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Started Import network configuration from initramfs.
Feb  9 13:27:12 gwosc-zfs1 systemd[1]: Starting Create Volatile Files and Directories...
stuartthebruce commented 3 years ago

I have now seen this on a second CentOS 8.3 + ZFS 2.0.2 system with a similarly configured 60-drive external SAS JBOD pool

stuartthebruce commented 3 years ago

Perhaps this is a bug in one of the Require or After services for zfs-import-cache, or additional dependencies should be added?

[root@gwosc-zfs1 ~]# cat /etc/systemd/system/zfs-import.target.wants/zfs-import-cache.service
[Unit]
Description=Import ZFS pools by cache file
Documentation=man:zpool(8)
DefaultDependencies=no
Requires=systemd-udev-settle.service
After=systemd-udev-settle.service
After=cryptsetup.target
After=multipathd.target
After=systemd-remount-fs.service
Before=zfs-import.target
ConditionPathExists=/etc/zfs/zpool.cache
ConditionPathIsDirectory=/sys/module/zfs

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN

[Install]
WantedBy=zfs-import.target
stuartthebruce commented 3 years ago

Looks like this might be a duplicate of #10891?

stuartthebruce commented 3 years ago

As recommended at https://zfsonlinux.topicbox.com/groups/zfs-discuss/Td5ea0587e058439b-M40eec800f31527397ca91aa2 the following works as a temporary solution, ExecStartPre=/usr/bin/sleep 30

zimme commented 3 years ago

I'm also seeing this on my Ubuntu Focal server. 10+ disks are connected via an HBA. Adding a sleep timeout works for me too.

sea36 commented 3 years ago

On Fedora 33 with two small (2 disk) luks encrypted arrays I've been had to work through a couple of similar problems...

  1. The zfs kernel module was getting loaded by udev, but the zfs-import-cache systemd target can run before that is triggered

echo "zfs" > /etc/modules-load.d/zfs.conf

systemctl edit zfs-import-cache.service

Add:

[Unit]
Requires=systemd-modules-load.service
After=systemd-modules-load.service
  1. the zfs-import-cache systemd target can run before the disks are unlocked/ready

systemctl edit zfs-import-cache.service

Add:

[Service]
ExecStartPre=/usr/bin/sleep 5
stuartthebruce commented 3 years ago

This issue is still an issue (arguably worse) with ZFS 2.1.0. In particular, a Rocky Linux 8.4 test zpool with qty 91 multipathd managed FC LUNs generates the following error at boot with ZFS 2.0.5 (without the above unit file edit),

[root@zfsbackup1 ~]# systemctl status zfs-import-cache
● zfs-import-cache.service - Import ZFS pools by cache file
   Loaded: loaded (/usr/lib/systemd/system/zfs-import-cache.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2021-07-06 16:41:48 PDT; 1min 33s ago
     Docs: man:zpool(8)
  Process: 14276 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN (code=exited, status=1/FAILURE)
 Main PID: 14276 (code=exited, status=1/FAILURE)

Jul 06 16:41:45 zfsbackup1 zpool[14276]: cannot import 'temp': no such pool or dataset
Jul 06 16:41:48 zfsbackup1 zpool[14276]: cannot import 'temp': no such pool or dataset
Jul 06 16:41:48 zfsbackup1 zpool[14276]:         Destroy and re-create the pool from
Jul 06 16:41:48 zfsbackup1 zpool[14276]:         a backup source.
Jul 06 16:41:48 zfsbackup1 zpool[14276]: cachefile import failed, retrying
Jul 06 16:41:48 zfsbackup1 zpool[14276]:         Destroy and re-create the pool from
Jul 06 16:41:48 zfsbackup1 zpool[14276]:         a backup source.
Jul 06 16:41:48 zfsbackup1 systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/>
Jul 06 16:41:48 zfsbackup1 systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Jul 06 16:41:48 zfsbackup1 systemd[1]: Failed to start Import ZFS pools by cache file.

and after patching to ZFS 2.1.0 a reboot spins its wheels for 3+ hours before I gave up,

PastedGraphic-1

Note, this system is able to successfully run zfs-import-cache at boot with both ZFS 2.0.5 and 2.1.0 using the above unit file edit.

behlendorf commented 3 years ago

Thanks for confirming this is still and issue with 2.1.

Mmako975 commented 3 years ago

Only recently ( with Opens ZFS 2.1.0-1 linux )
have found that if, by some mishap, a zfs pool is lost | missing during systemd boot I get with this running message : A start job is running for import ZFS pools by cache file ( 1h 23min 22s / no limit ) and I am unable to boot ...

I still have not edited any files but as soon as I "forgot" to "zpool export" the before prior to reboot if the pool is not present I am unable to startup my machine ...

( One detail is that I am using the zfs filesystem with External USB drives... ) So sometimes I forgot to export one of the USB pools leading to this issue.

stuartthebruce commented 2 years ago

Any chance of accepting a one word change from After=multipathd.target to After=multipathd.service in /etc/systemd/system/zfs-import.target.wants/zfs-import-cache.service for Linux systems? I have confirmed that this is still needed on multiple Rocky Linux 8 systems with a large number of multi-path SAS drives (qty 60) running ZFS 2.1.5 to avoid zpool import from starting before multipathd is done at boot time. And it doesn't seem to cause any problems for smaller systems. However, I am still a systemd newbie and there may be a better solution, but simply changing from .target to .service works on my systems and it would be nice not to have to patch this unit file after every ZFS upgrade.

The downside to not patching this is that some users may be running pools that have successfully imported multipath capable devices without actually having the path redundancy that they expect until they discover that the hard way when the non-redundant path that zpool import grabbed fails.

varl commented 1 year ago

Tried a lot of different variants from here and around the internet. Finally I found a workaround by adding a custom service unit for the pool, in addition to adding zfs to /etc/modules-load.d/zfs.conf.

/etc/systemd/system/zfs-import-storage.service:

[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
After=systemd-modules-load.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none storage
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache

[Install]
WantedBy=zfs-import.target

Then:

echo "zfs" > /etc/modules-load.d/zfs.conf
systemctl enable zfs-import-scan.service
systemctl enable zfs-import-cache.service
systemctl enable zfs-import-storage.service

After that the pool is imported and ready on boot, although I still get these messages:

Apr 01 17:52:38 host systemd[1]: Dependency failed for Install ZFS kernel module.
Apr 01 17:52:38 host systemd[1]: Dependency failed for Import ZFS pools by device scanning.
Apr 01 17:52:38 host systemd[1]: Dependency failed for Import ZFS pools by cache file.

I can live with that for now.

sbbeachvball commented 1 year ago

I used the systemctl service above as a reference, but took a slightly different approach, more similar to a service I created years ago involving a race condition with bind mounts and ZFS. I have only tested on one system (Fedora-38), which uses zfs-import-cache, but it worked there. This service really does nothing other than cause systemctl to reorder the services involved.

cat <<EOF > /etc/systemd/system/zfs-import-race-condition-fix.service
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
After=systemd-modules-load.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/cat /dev/null

[Install]
WantedBy=zfs-import.target
EOF

systemctl enable zfs-import-race-condition-fix

echo "zfs" > /etc/modules-load.d/zfs.conf

The system I am testing on is a Fedora-38 box. I do not see the errors mentioned above, but am seeing warnings related to a deprecated service called by the zfs-import-* services.

Jun 17 09:55:16 myhost udevadm[980]: systemd-udev-settle.service is deprecated. Please fix zfs-import-cache.service, zfs-import-scan.service not to pull it in.

varl commented 1 year ago

@sbbeachvball wrote:

The system I am testing on is a Fedora-38 box. I do not see the errors mentioned above, but am seeing warnings related to a deprecated service called by the zfs-import-* services.

Very nice, I just tried this and works on a Debian 12 machine as well. Everything is imported on boot without the hacks for cache-restoration.

Thanks for sharing !

pansila commented 7 months ago

I met the similar issue in Debian 12 on a SBC but figured it out with a different solution:

  1. $ sudo echo zfs >> /etc/modules-load.d/modules.conf
  2. zfs-import-cache.service is disabled by default, I have to enable it manully, but it relies on the service zfs-import which is masked somehow on my distribution and I can't even unmask it. Thus the enabled service which is created on the /etc/systemd/system/zfs-import.target.wants won't execute at all. I have to rebuild the dependency chain on the /etc/systemd/system/zfs.target.wants by creating it manully
    $ sudo ln -s /lib/systemd/system/zfs-import-cache.service /etc/systemd/system/zfs.target.wants/zfs-import-cache.service
  3. add the waiting for the module zfs being loaded to zfs-import-cache.service
    [Unit]
    ...
    After=systemd-modules-load.service
    ...
ianb-mp commented 4 months ago

@sbbeachvball's suggested fix didn't work for me unfortunately. I had to insert a sleep 30 instead (as mentioned by @stuartthebruce)

In my case, the ZFS pool is located on a block dev which corresponds with a RAID array (megaraid_sas). I tried including an After= line corresponding with the systemd device unit for that block device (as per advice here), but that didn't make any difference.