openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

Cannot online device when autoexpand=on is set on the pool #8449

Open fhriley opened 5 years ago

fhriley commented 5 years ago

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Linux Kernel 4.15.0-45-generic
Architecture x86_64
ZFS Version 0.7.12-1ubuntu5~18.04.york0
SPL Version 0.7.12-1ubuntu2~18.04.york0

Describe the problem you're observing

I wanted to move some disks to different bays so I offlined the devices, moved them, and then attempted to online them. The online fails with the following:

root@nas:~# zpool online zfs /dev/disk/by-bay/bay4-part2
cannot online /dev/disk/by-bay/bay4-part2: cannot relabel '/dev/disk/by-bay/bay4-part2': unable to read disk capacity

I traced the source of the error to here: https://github.com/zfsonlinux/zfs/blob/a769fb53a17e7f8b6375b25cd23f3ff491b631cb/lib/libzfs/libzfs_pool.c#L2687

zpool_relabel_disk is called because of this check https://github.com/zfsonlinux/zfs/blob/a769fb53a17e7f8b6375b25cd23f3ff491b631cb/lib/libzfs/libzfs_pool.c#L2766 which only enters the block if I gave the -e option to the online or my pool has autoexpand turned on. So, I turned off autoexpand on my pool and sure enough, the online worked.

Describe how to reproduce the problem

Set up a pool with a mirror of disks with extra space at the end and autoexpand turned on. Below is an example from one of the disks in my pool. There are 49 sectors at the end that are not used. This pool was originally created on FreeNAS.

root@nas:~# fdisk -l /dev/disk/by-bay/bay4
Disk /dev/disk/by-bay/bay4: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 3E647600-297E-11E8-8ADA-0CC47AABF094

Device                    Start         End     Sectors  Size Type
/dev/disk/by-bay/bay4p1     128     4194431     4194304    2G FreeBSD swap
/dev/disk/by-bay/bay4p2 4194432 15628053119 15623858688  7.3T FreeBSD ZFS

My pool looks like this:

root@nas:~# zpool status -P zfs
  pool: zfs
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: resilvered 5.72M in 0h0m with 0 errors on Sat Feb 23 07:01:12 2019
config:

    NAME                                                                                                                         STATE     READ WRITE CKSUM
    zfs                                                                                                                          ONLINE       0     0     0
      mirror-0                                                                                                                   ONLINE       0     0     0
        /dev/disk/by-bay/bay4-part2                                                                                              ONLINE       0     0     0
        /dev/disk/by-bay/bay8-part2                                                                                              ONLINE       0     0     0
      mirror-2                                                                                                                   ONLINE       0     0     0
        /dev/disk/by-bay/bay7-part2                                                                                              ONLINE       0     0     0
        /dev/disk/by-bay/bay3-part2                                                                                              ONLINE       0     0     0
      mirror-3                                                                                                                   ONLINE       0     0     0
        /dev/disk/by-bay/bay6-part2                                                                                              ONLINE       0     0     0
        /dev/disk/by-bay/bay2-part2                                                                                              ONLINE       0     0     0
      mirror-4                                                                                                                   ONLINE       0     0     0
        /dev/disk/by-bay/bay1-part2                                                                                              ONLINE       0     0     0
        /dev/disk/by-bay/bay5-part2                                                                                              ONLINE       0     0     0
    logs
      /dev/disk/by-id/nvme-nvme.8086-50484d42373433353030415132383043474e-494e54454c2053534450454431443238304741-00000001-part1  ONLINE       0     0     0

errors: No known data errors
root@nas:~# zpool get autoexpand zfs
NAME  PROPERTY    VALUE   SOURCE
zfs   autoexpand  on      local

Then, offline a device and try to online it:

root@nas:~# zpool offline zfs /dev/disk/by-bay/bay4-part2
root@nas:~# zpool online zfs /dev/disk/by-bay/bay4-part2
cannot online /dev/disk/by-bay/bay4-part2: cannot relabel '/dev/disk/by-bay/bay4-part2': unable to read disk capacity

Then, turn off autoexpand and online the device:

root@nas:~# zpool set autoexpand=off zfs
root@nas:~# zpool online zfs /dev/disk/by-bay/bay4-part2

Include any warning/errors/backtraces from the system logs

jspuij commented 4 years ago

Thanks @fhriley, I had the same issue, also a pool imported from Freenas, and this solved it.

behlendorf commented 4 years ago

Thanks for clearly documenting this compatibility issue. The core issue here is that since the pool was created on FreeBSD the partition layout differs slightly from what's expected on Linux. This difference prevents ZFS from being able to auto-expand the device since it can't safely re-partition it to use the additional space. Unfortunately, that results in the device being online at all.

At a minimum we could improve this error message. And in the name of compatibility it would be desirable to allow the device to be brought online but not not auto-expanded. Additional compatibility code would need to be added to somehow verify the FreeBSD partition layout before auto-expand would work on this device.

fhriley commented 3 years ago

@behlendorf I discovered today that this issue still exists:

root@nas:~# zpool online zfs /dev/disk/by-bay/bay8-part1
cannot online /dev/disk/by-bay/bay8-part1: cannot relabel '/dev/disk/by-bay/bay8-part1': unable to read disk capacity
root@nas:~# zpool set autoexpand=off zfs
root@nas:~# zpool online zfs /dev/disk/by-bay/bay8-part1
root@nas:~# zpool set autoexpand=on zfs
root@nas:~# zfs --version
zfs-2.0.4-0york2~20.04
zfs-kmod-2.0.4-0york2~20.04

It's been a few years since I created this pool, but I'm fairly certain the partitions on these disks were created by OpenZFS. This is what they look like:

root@nas:~# fdisk -l /dev/disk/by-bay/bay8
Disk /dev/disk/by-bay/bay8: 7.28 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 36D3290F-8510-E048-98D0-D6C851CB1EDE

Device                        Start         End     Sectors  Size Type
/dev/disk/by-bay/bay8p1        2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/disk/by-bay/bay8p9 15628036096 15628052479       16384    8M Solaris reserved 1
thenickdude commented 2 years ago

Still an issue on 2.1.5. My raidz1 has two 3TB disks and one 6TB disk, and the one I offlined was a 3TB disk, so maybe that creates the situation:

$ zpool offline primary-data 2934056603744768573
$ zpool status -v primary-data
  pool: primary-data
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using 'zpool online' or replace the device with
    'zpool replace'.
  scan: scrub repaired 0B in 05:33:14 with 0 errors on Sun Sep 11 05:57:32 2022
config:

    NAME                                 STATE     READ WRITE CKSUM
    primary-data                         DEGRADED     0     0     0
      raidz1-0                           DEGRADED     0     0     0
        ata-ST3000NC002-1DY166_Z1F43FSN  ONLINE       0     0     0
        ata-ST3000NC002-1DY166_Z1F43GCJ  OFFLINE      0     0     0
        ata-ST6000VN001-2BB186_ZCT44FA0  ONLINE       0     0     0
$ zpool online primary-data /dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ-part2
cannot online /dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ-part2: cannot relabel '/dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ-part2': unable to read disk capacity
$ zpool set autoexpand=off primary-data
$ zpool online primary-data /dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ-part2
$ zpool status primary-data
  pool: primary-data
 state: ONLINE
  scan: resilvered 264K in 00:00:01 with 0 errors on Thu Sep 15 19:52:10 2022
config:

    NAME                                 STATE     READ WRITE CKSUM
    primary-data                         ONLINE       0     0     0
      raidz1-0                           ONLINE       0     0     0
        ata-ST3000NC002-1DY166_Z1F43FSN  ONLINE       0     0     0
        ata-ST3000NC002-1DY166_Z1F43GCJ  ONLINE       0     0     0
        ata-ST6000VN001-2BB186_ZCT44FA0  ONLINE       0     0     0

errors: No known data errors

The disk's partition table is this:

Disk /dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: ST3000NC002-1DY1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 40D59642-8198-11E3-B6AE-D850E64929A5

Device                                                  Start        End    Sectors  Size Type
/dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ-part1     128    4194431    4194304    2G FreeBSD swap
/dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ-part2 4194432 5860533127 5856338696  2.7T FreeBSD ZFS

And no, I cannot online the whole disk, only the partition, so that isn't the cause:

$ zpool online primary-data /dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ
cannot online /dev/disk/by-id/ata-ST3000NC002-1DY166_Z1F43GCJ: no such device in pool
rincebrain commented 8 months ago

This is absolutely still broken in 2.1.12 and 2.2.2, someone just ran into it today on IRC.

(I see that it was marked closed because the original framing wasn't marked as a bug, but no, IMO, at a minimum the lack of online working at all or telling you why it failed is absolutely a bug.)

shuther commented 1 month ago

I confirm that the problem is still there; at least on the latest zfs on ubuntu:

zfs-2.1.5-1ubuntu6~22.04.4
zfs-kmod-2.1.5-1ubuntu6~22.04.1

Pool was definitely created using linux/open-zfs