openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.34k stars 1.72k forks source link

zpool add of a busy disk returns a misleading error #15966

Open glebius opened 4 months ago

glebius commented 4 months ago

Observing on FreeBSD 15.0-CURRENT, but my code reading says on Linux it is likely to be the same.

> zpool add tank cache ada0
cannot add to 'tank': no such pool or dataset

However, both the tank exists and ada0 also exists. As you see the error message is really unhelpful.

The problem is that ada0 is used, either by ZFS or by any other filesystem or swap, or whatever. The call path that leads to this is the following:

#0  open (path=0x7fffffffbc30 "/dev/ada0", flags=0) at /usr/src/FreeBSD/lib/libsys/open.c:46
#1  0x000000080109b4ba in g_device_path_open (devpath=devpath@entry=0x7fffffffbc30 "/dev/ada0", fdp=fdp@entry=0x7fffffffbaec, dowrite=dowrite@entry=0) at /usr/src/FreeBSD/lib/libgeom/geom_util.c:283
#2  0x000000080109b442 in g_open (name=name@entry=0x7fffffffbc30 "/dev/ada0", dowrite=dowrite@entry=0) at /usr/src/FreeBSD/lib/libgeom/geom_util.c:55
#3  0x000000080113035f in zfs_dev_is_whole_disk (dev_name=dev_name@entry=0x7fffffffbc30 "/dev/ada0") at /usr/src/FreeBSD/sys/contrib/openzfs/lib/libzutil/os/freebsd/zutil_device_path_os.c:83
#4  0x000000000104f104 in is_shorthand_path (arg=0x801e08028 "ada0", path=0x7fffffffbc30 "/dev/ada0", path_size=1024, statbuf=0x7fffffffbb50, wholedisk=<optimized out>) at /usr/src/FreeBSD/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:192
#5  make_leaf_vdev (props=props@entry=0x0, arg=0x801e08028 "ada0", is_primary=is_primary@entry=B_TRUE) at /usr/src/FreeBSD/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:324
#6  0x000000000104cf4c in construct_spec (props=0x0, argc=2, argv=0x801e1a078) at /usr/src/FreeBSD/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:1604
#7  0x000000000104e5e8 in make_root_vdev (zhp=zhp@entry=0x801e5a000, props=0x0, force=force@entry=0, check_rep=1, replacing=replacing@entry=B_FALSE, dryrun=dryrun@entry=B_FALSE, argc=2, argv=0x801e1a078) at /usr/src/FreeBSD/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:1867
#8  0x00000000010388be in zpool_do_add (argc=2, argv=0x801e1a078) at /usr/src/FreeBSD/sys/contrib/openzfs/cmd/zpool/zpool_main.c:1122
#9  0x00000000010373bd in main (argc=5, argv=0x7fffffffd918) at /usr/src/FreeBSD/sys/contrib/openzfs/cmd/zpool/zpool_main.c:11549

This open() will return EPERM. This error code is not covered by libzfs zpool_add(), neither by zpool_standard_error(), nor by zfs_common_error(). However, covering this error code in these functions won't help :( The problem is that zfs_dev_is_whole_disk() is boolean - it doesn't propagate the error up the stack. Checked its Linux twin - it is the same.

So, the right fix is to propagate this EPERM all the way up to zpool_do_add(), which isn't trivial. I can write the code, but first want to get some directions from OpenZFS developers on how exactly you want this to be done.

amotin commented 4 months ago

I think B_FALSE return of zfs_dev_is_whole_disk() in case of disk opening failure is wrong on any OS. It should be able to return error. Since on FreeBSD ZFS never partitions its disks, on FreeBSD should always return either B_TRUE or error, in the last case addition should fail with respective status.