openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.33k stars 1.72k forks source link

Erroneous 'ZFS label hostid mismatch' warning in latest master 0ad85ed #2153

Closed thalamus closed 10 years ago

thalamus commented 10 years ago

After updating master from 98fad86 to 0ad85ed, on the first reboot after performing a scrub to correct the disk format errata, it failed to import the pool stating that it was in use by another system. It wasn't and never has been in use by another system, so I forced the import, and all was fine after that, except the for the below erroneous warning in zpool status which has persisted ever since.

The pool has had no further problems importing automatically on boot since the initial issue.

Aeolus /home/thalamus # zpool status
  pool: storage
 state: ONLINE
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
  scan: scrub repaired 0 in 1h45m with 0 errors on Sun Mar  2 01:45:28 2014
config:

    NAME                                            STATE     READ WRITE CKSUM
    storage                                         ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        ata-Hitachi_HDS721010CLA332_JP2940HZ22H5YC  ONLINE       0     0     0
        ata-Hitachi_HDS721010CLA332_JP2940HZ2375RC  ONLINE       0     0     0
        ata-ST1000DM003-9YN162_S1D0ETX3             ONLINE       0     0     0
    logs
      ata-OCZ-VERTEX2_OCZ-H0O0I81ZDZ55LSQM-part1    ONLINE       0     0     0
    cache
      ata-OCZ-VERTEX2_OCZ-H0O0I81ZDZ55LSQM-part2    ONLINE       0     0     0

errors: No known data errors

I attempted exporting and reimporting the pool, and it is still present.

Aeolus /home/thalamus # zfs umount -a
Aeolus /home/thalamus # zpool export storage
Aeolus /home/thalamus # zpool import
   pool: storage
     id: 16622208931620915975
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

    storage                                         ONLINE
      raidz1-0                                      ONLINE
        ata-Hitachi_HDS721010CLA332_JP2940HZ22H5YC  ONLINE
        ata-Hitachi_HDS721010CLA332_JP2940HZ2375RC  ONLINE
        ata-ST1000DM003-9YN162_S1D0ETX3             ONLINE
    cache
      ata-OCZ-VERTEX2_OCZ-H0O0I81ZDZ55LSQM-part2
    logs
      ata-OCZ-VERTEX2_OCZ-H0O0I81ZDZ55LSQM-part1    ONLINE
Aeolus /home/thalamus # zpool import storage
Aeolus /home/thalamus # zpool status
  pool: storage
 state: ONLINE
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
  scan: scrub repaired 0 in 1h45m with 0 errors on Sun Mar  2 01:45:28 2014
config:

    NAME                                            STATE     READ WRITE CKSUM
    storage                                         ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        ata-Hitachi_HDS721010CLA332_JP2940HZ22H5YC  ONLINE       0     0     0
        ata-Hitachi_HDS721010CLA332_JP2940HZ2375RC  ONLINE       0     0     0
        ata-ST1000DM003-9YN162_S1D0ETX3             ONLINE       0     0     0
    logs
      ata-OCZ-VERTEX2_OCZ-H0O0I81ZDZ55LSQM-part1    ONLINE       0     0     0
    cache
      ata-OCZ-VERTEX2_OCZ-H0O0I81ZDZ55LSQM-part2    ONLINE       0     0     0

errors: No known data errors
Aeolus /home/thalamus # 
behlendorf commented 10 years ago

Is it possible the host id for your system changed? The error indicates that the id in the label doesn't match the one for your system. Aside from that it's harmless.

behlendorf commented 10 years ago

I was able to reproduce this in a Fedora VM with the latest code. It appears to be a side effect of the recent systemd integration. Because the ZFS modules may now be loaded before the network is brought up the hostid may not be set. However, this behavior is racy since it's entirely possible the ZFS module will be loaded after the network is configured.

This artificial dependency on the hostid and in turn the network which we inherited from Illumos has always been a problem. The cleanest fix is to shed this dependency entirely by implementing #745 but that's a fairly large development item.

Shorter term we could ensure the zpool import always occurs after the network has been brought up. This isn't the most desirable solution but it is arguably correct until we can shake this dependency.

@Lalufu any thoughts? You might not have noticed this in your initial testing because it's unlikely to occur if there are a large number of drives or they are slow to settle.

Lalufu commented 10 years ago

The way the systemd units are structured the import of all pools will happen quite early in the boot process.

The ZFS module is loaded by the systemd-modules-load.service which runs very early in the boot process. Shortly after the import and mount of ZFS pools and filesystems is done, before local-fs.target is reached. This has been done so that ZFS file systems are treated equally to those mentioned in /etc/fstab wrt the time they're available to the rest of the system.

Network related services are started as part of basic.target, which comes quite a bit later in the boot process. So for a normal boot the import will always happen at a time when there is no network available.

The whole dependency tree can be seen with systemctl list-dependencies --before zfs-mount

behlendorf commented 10 years ago

Since I'd really like to avoid adding a dependency for something which happens so late in the boot. I'm inclined to disable the hostid check when the hostid is set to a loopback device. This would resolve a whole class of problems where pools fail to import cleanly.

aarcane commented 10 years ago

What ever happened to generating a random hostid file as part of zfs install? I thought there had been a consensus on that solution a long time ago as there was never any consistently available hostid and ssh had already set precedent for doing this?

behlendorf commented 10 years ago

If there was consensus then I missed it and no one ever posted patches. That sounds like a reasonable solution and if something like ssh already needs to do this for some use case so much the better. Can you reference the prior thread?

behlendorf commented 10 years ago

This issue was resolved by zfsonlinux/spl@acf0ade362cb8b26d67770114ee6fa17816e6b65. Unless an /etc/hostid file has been created for the node the hostid check will be disabled.