openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

`zpool status` output with 2.2.0 userspace and 2.1.13 kernel-modules adds `(non-allocating)` to vdev lines #15472

Open siv0 opened 10 months ago

siv0 commented 10 months ago

System information

Type Version/Name
Distribution Name Proxmox 8.0 (Debian 12 based)
Distribution Version 8.0/12.1
Kernel Version 6.2.16-19-pve
Architecture amd64
OpenZFS Version zfs-2.2.0-pve2 (userspace), zfs-kmod-2.1.13-pve1 (kernelspace)

Describe the problem you're observing

First of all - I currently think this is a purely cosmetic problem, without any impact on the ability to import/operate/work with systems with 2.2.0 userspace+2.1.13 kernelspace - but thought I'll open an issue just to be on the safe side.

I read https://github.com/openzfs/zfs/blob/master/RELEASES.md as indicating that backward compatibility should be preserved across a change from 2.1.x to 2.2.y. If this is not a valid assumption - sorry for the noise - and feel free to close the issue!

When running zfs 2.2.0 userspace utilities with a kernel that still has 2.1.13 modules zpool status adds (non-allocating) next to the disk name of a single-disk pool (did not verify with other layouts):

# zpool version
zfs-2.2.0-pve2
zfs-kmod-2.1.13-pve1
# zpool status
  pool: localzpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:10 with 0 errors on Tue Oct  4 12:12:25 2022
config:

        NAME                                    STATE     READ WRITE CKSUM
        localzpool                              ONLINE       0     0     0
          scsi-0QEMU_QEMU_HARDDISK_drive-scsi2  ONLINE       0     0     0  (non-allocating)

errors: No known data errors

I strongly assume that this is not related to our (Proxmox) packaging, but did not explicitly verify this elsewhere. This was reported in our community forum (for a bit more context): https://forum.proxmox.com/threads/opt-in-linux-6-5-kernel-with-zfs-2-2-for-proxmox-ve-8-available-on-test.135635/post-600411

A quick search in the commit history might point to 2a673e76a928cca4df7794cdcaa02e0be149c4da ("Vdev Properties Feature") as initial commit that introduced the output (I did not check if/how this could be adapted).

Describe how to reproduce the problem

run a kernel with 2.1.X modules loaded, and run zpool status with 2.2.0 userspace-utils

Include any warning/errors/backtraces from the system logs

see above

Thanks for all the work you put into OpenZFS!

stuartthebruce commented 10 months ago

I have also seen this independent of Proxmox, https://zfsonlinux.topicbox.com/groups/zfs-discuss/T5674d5f404027036-Ma76127fe69a3d4632022cf43

seanjnkns commented 10 months ago

I experienced the same issue myself when I first updated one of my systems and rebuilding the initramfs and rebooting corrected the issue. As it's not just cosmetic, I couldn't even upgrade zfs/zpool until I had done the aforementioned.

siv0 commented 10 months ago

I experienced the same issue myself when I first updated one of my systems and rebuilding the initramfs and rebooting corrected the issue. As it's not just cosmetic, I couldn't even upgrade zfs/zpool until I had done the aforementioned.

I would not expect to be able to upgrade a zpool, while still running the old kernel-modules - the upgrade needs to write that the new features are available to the pool - and the old kernel-module does not know about their existence.

ThomasLamprecht commented 9 months ago

Related to #13412, but then it'd seem like the VDEV_STAT_VALID macro isn't really working here.

FYI, for the uint64_t_field_count, i.e., vsc on call-site in zpool_main.c's print_status_config, I get:

And (offsetof(vdev_stat_t, vs_noalloc) + sizeof (((vdev_stat_t *)NULL)->vs_noalloc)) / sizeof(uint64_t) evaluates to 46, so that check returns already true for 2.1.13 even if vdev props are not supported there yet.

The reason for this seems to be that the patch adding the vs_pspace field was backported, but the one adding vs_noalloc was not. Itself that is not a problem, but in 2.2 noalloc was added before psspace, so the struct layout between 2.1.13 and 2.2.0 do NOT match anymore...

I.e., the struct looks like the following at the end for ZFS 2.1.x:

typedef struct vdev_stat {
        hrtime_t        vs_timestamp;           /* time since vdev load */
        // snip
        uint64_t        vs_logical_ashift;      /* vdev_logical_ashift  */
        uint64_t        vs_physical_ashift;     /* vdev_physical_ashift */
        uint64_t        vs_pspace;              /* physical capacity */
} vdev_stat_t;

And like the following on ZFS 2.2.x:

typedef struct vdev_stat {
        hrtime_t        vs_timestamp;           /* time since vdev load */
        // snip
        uint64_t        vs_logical_ashift;      /* vdev_logical_ashift  */
        uint64_t        vs_physical_ashift;     /* vdev_physical_ashift */
        uint64_t        vs_noalloc;             /* allocations halted?  */
        uint64_t        vs_pspace;              /* physical capacity */
} vdev_stat_t;

Resulting in 2.2.x user-space tooling interpreting the vs_pspace field from the 2.1.x kernel module as vs_noalloc field.

The original PR #11711 that added vs_noalloc added the new field correctly to the end, as did the PR #13106 that added vs_pspace, but the backport of the latter without the former messed this up.

Either the backport should have added an 8-byte padding before the vs_pspace field, or the 2.2 struct should have re-ordered those two fields. In any way, this is really not ideal, and while mostly cosmetic, if users or tools make decisions from that difference the resulting effects could be problematic.

ThomasLamprecht commented 9 months ago

For now, at Proxmox we will do a local stop-gap "fix" this by tightening the check in the print_status_config function to also check via VDEV_STAT_VALID if vs_pspace is present, as then we can be sure that the struct layout of the kernel and user-space actually matches again.

Not sure if that'd be acceptable for upstream too. But there aren't much other options IMO, as with this mismatch being in two released versions the cat is out of the bag, and so doing another change to the struct in question (e.g., adding the padding now to 2.1 series) won't help..

the-c-drive commented 8 months ago

I had this issue and was directed here. I'm on Debian 12, and pulled OpenZFS 2.2.2 from bookworm backports.

I had a version mismatch of zfs-2.2.2-3~bpo12+1 zfs-kmod-2.1.14-1~bpo12+1

I had the non-allocating message in my zpool status, so I then ran dpkg-reconfigure zfs-dkms to rebuild kernel modules.

Once that was complete, I rebooted, checked the version to find this zfs-2.2.2-3~bpo12+1 zfs-kmod-2.2.2-3~bpo12+1

Once I had matching versions, I then ran zpool upgrade (pool name) and I appear to be good.