Open stuartthebruce opened 5 years ago
what does "multipath -l -v1" show ?
maybe the device names being represented under /dev somewhere, can you have a look how they appear there ?
just to see if that is a mulitpath or zfs issue
That doesn't show any trailing characters, for example, on an SL7.7 system running ZFS 0.7.13
[root@node810 ~]# zpool status jbod2-node810-data1
pool: jbod2-node810-data1
state: ONLINE
scan: scrub repaired 0B in 36h29m with 0 errors on Wed May 8 22:18:55 2019
config:
NAME STATE READ WRITE CKSUM
jbod2-node810-data1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
35000cca2530aa110 ONLINE 0 0 0
35000cca2530aa424 ONLINE 0 0 0
35000cca2530aacb4 ONLINE 0 0 0
35000cca2530e297c1 ONLINE 0 0 0
35000cca2530f661c1 ONLINE 0 0 0
35000cca253100b68 ONLINE 0 0 0
35000cca253123c08 ONLINE 0 0 0
35000cca253158878 ONLINE 0 0 0
errors: No known data errors
compared to,
[root@node810 ~]# multipath -l -v1 | grep -C5 35000cca2530e297c
35000cca2531d7500
35000cca2531e6404
35000cca2530a6f3c
35000cca2530a1140
35000cca2531e868c
35000cca2530e297c
35000cca253032d58
35000cca2530925bc
35000cca253178e50
35000cca2530a63f4
35000cca2531e5f6c
And here is what I find for one of the above WWN under /dev,
[root@node810 ~]# find /dev -name "*35000cca2530e297c*"
/dev/disk/by-id/scsi-35000cca2530e297c
/dev/disk/by-id/dm-uuid-part9-mpath-35000cca2530e297c
/dev/disk/by-id/dm-name-35000cca2530e297c9
/dev/disk/by-id/dm-uuid-part1-mpath-35000cca2530e297c
/dev/disk/by-id/dm-name-35000cca2530e297c1
/dev/disk/by-id/dm-uuid-mpath-35000cca2530e297c
/dev/disk/by-id/dm-name-35000cca2530e297c
/dev/mapper/35000cca2530e297c9
/dev/mapper/35000cca2530e297c1
/dev/mapper/35000cca2530e297c
where the extra "1" presumably comes from the partition table,
[root@node810 ~]# fdisk -l /dev/mapper/35000cca2530e297c
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.
Disk /dev/mapper/35000cca2530e297c: 12000.1 GB, 12000138625024 bytes, 23437770752 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: gpt
Disk identifier: 17DE1481-B0B3-DF4B-88F3-82E2413936D8
# Start End Size Type Name
1 2048 23437752319 10.9T Solaris /usr & zfs-16f1ad70fd6ed32f
9 23437752320 23437768703 8M Solaris reserve
FWIW, I updated to SL7.7 and zfs 0.8.3 and it shows the same behavior of sometimes displaying the partition number for WWN that end in a "c",
[root@node806 ~]# uname -a
Linux node806 3.10.0-1062.7.1.el7.x86_64 #1 SMP Thu Dec 5 14:45:00 CST 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@node806 ~]# rpm -q zfs
zfs-0.8.3-1.el7.x86_64
[root@node806 ~]# zpool status
pool: jbod1-node806-data1
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
jbod1-node806-data1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
35000cca253077224 ONLINE 0 0 0
35000cca253077640 ONLINE 0 0 0
35000cca25308c90c1 ONLINE 0 0 0
35000cca25308e49c1 ONLINE 0 0 0
35000cca25308e95c1 ONLINE 0 0 0
35000cca2530c2410 ONLINE 0 0 0
35000cca2530ca5ac1 ONLINE 0 0 0
35000cca2530e04d8 ONLINE 0 0 0
35000cca2530e8568 ONLINE 0 0 0
35000cca2530e8598 ONLINE 0 0 0
errors: No known data errors
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
Hi,
I'm seeing this with:
[root@storage-seq-1 ~]# rpm -qa | grep zfs | sort
kmod-zfs-2.1.13-1.el8.x86_64
libzfs5-2.1.13-1.el8.x86_64
zfs-2.1.13-1.el8.x86_64
[root@storage-seq-1 ~]# uname -a
Linux storage-seq-1 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Wed Sep 20 15:55:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@storage-seq-1 ~]# cat /etc/rocky-release
Rocky Linux release 8.8 (Green Obsidian)
Example:
pool: jbodpool
state: ONLINE
scan: resilvered 1.82T in 04:39:13 with 0 errors on Fri Mar 22 18:21:28 2024
config:
NAME STATE READ WRITE CKSUM
jbodpool ONLINE 0 0 0
draid3:8d:102c:6s-0 ONLINE 0 0 0
35000cca2be7567b8 ONLINE 0 0 0
35000cca2be4f696c1 ONLINE 0 0 0
35000cca2be0f2454 ONLINE 0 0 0
35000cca2be1eb180 ONLINE 0 0 0
35000cca2be760f1c1 ONLINE 0 0 0
35000cca2be760f44 ONLINE 0 0 0
35000cca2be4f6b44 ONLINE 0 0 0
35000cca2be1ed5a0 ONLINE 0 0 0
35000cca2be1def80 ONLINE 0 0 0
35000cca2be1d4760 ONLINE 0 0 0
35000cca2be1dff68 ONLINE 0 0 0
35000cca2be1e5708 ONLINE 0 0 0
35000cca2be1e9644 ONLINE 0 0 0
35000cca2be1e2980 ONLINE 0 0 0
35000cca2be4f773c1 ONLINE 0 0 0
35000cca2be4f74a4 ONLINE 0 0 0
35000cca2be760dac1 ONLINE 0 0 0
35000cca2be756724 ONLINE 0 0 0
35000cca2be0f66bc1 ONLINE 0 0 0
35000cca2be02ab7c1 ONLINE 0 0 0
35000cca2be5f506c1 ONLINE 0 0 0
Example from /dev/mapper
showing how WWNs that end in c
get a 1
and 9
appended for the partitions, rather than a p1
and p9
:
lrwxrwxrwx 1 root root 9 Mar 22 13:41 35000cca2be7b2f9c -> ../dm-849
lrwxrwxrwx 1 root root 9 Mar 22 13:42 35000cca2be7b2f9c1 -> ../dm-853
lrwxrwxrwx 1 root root 9 Mar 22 13:40 35000cca2be7b2f9c9 -> ../dm-854
lrwxrwxrwx 1 root root 8 Mar 22 13:40 35000cca2be7c21c4 -> ../dm-17
lrwxrwxrwx 1 root root 8 Mar 22 13:42 35000cca2be7c21c4p1 -> ../dm-24
lrwxrwxrwx 1 root root 8 Mar 22 13:40 35000cca2be7c21c4p9 -> ../dm-28
I don't actually understand why those 1
and 9
mappings are even present. On two similar servers they don't show up. The difference in my case is that the server with this problem had the pool set up before multipath
was enabled.
More testing, if I add
skip_kpartx yes
to my /etc/multipath.conf
, then multipathd
no longer creates the mapped partition devices. However, zpool import
now fails with all devices being UNAVIL
:
[root@storage-seq-1 ~]# zpool import -d /dev/mapper
pool: jbodpool
id: 11481882034934482336
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:
jbodpool UNAVAIL insufficient replicas
draid3:8d:102c:6s-0 UNAVAIL insufficient replicas
35000cca2be7567b8 UNAVAIL
35000cca2be4f696c1 UNAVAIL
35000cca2be0f2454 UNAVAIL
35000cca2be1eb180 UNAVAIL
35000cca2be760f1c1 UNAVAIL
35000cca2be760f44 UNAVAIL
35000cca2be4f6b44 UNAVAIL
35000cca2be1ed5a0 UNAVAIL
35000cca2be1def80 UNAVAIL
35000cca2be1d4760 UNAVAIL
35000cca2be1dff68 UNAVAIL
35000cca2be1e5708 UNAVAIL
35000cca2be1e9644 UNAVAIL
35000cca2be1e2980 UNAVAIL
35000cca2be4f773c1 UNAVAIL
35000cca2be4f74a4 UNAVAIL
35000cca2be760dac1 UNAVAIL
35000cca2be756724 UNAVAIL
35000cca2be0f66bc1 UNAVAIL
35000cca2be02ab7c1 UNAVAIL
35000cca2be5f506c1 UNAVAIL
35000cca2be1e31d8 UNAVAIL
...
The pool should have 306 drives, but only one of the three draid3
vdevs shows up and in it only a single drive appears as ONLINE
.
Yes, because it's the partition devices that are in the pool, it just hides the "-part1".
Yes, because it's the partition devices that are in the pool, it just hides the "-part1".
Why do the pools I created directly on multipath mapped devices not do this? Does ZFS do something different when given a multipath device?
I'm not aware of ZFS having any multipath-specific code.
I'm not sure I understand the question.
As I noted above, I have numerous similar servers where the pool was created after multipath was enabled, none of those exhibit this behavior nor does multipath map devices for partitions. This problematic server had the pool created before multipath was enabled, and now maps the 1 and 9 partitions as devices.
I thought that by adding skip_kpartx
and preventing multipath from mapping partitions to devices that I'd force the non-mapped partition behavior, but that just results in a pool that can't be imported.
So in light of the observed behavior, I'm asking does ZFS do anything differently when creating a new pool on multipath mapped devices than it does with regular single path devices?
And I think I answered my own question, a disk in a pool created on multipath mapped devices results in:
[root@storage-odb2-1 ~]# blkid /dev/mapper/35000cca2a601d650
/dev/mapper/35000cca2a601d650: LABEL="datapool" UUID="12922485268461784626" UUID_SUB="6113558819263140911" TYPE="zfs_member"
[root@storage-odb2-1 ~]# gdisk -l /dev/mapper/35000cca2a601d650
GPT fdisk (gdisk) version 1.0.3
Warning: Partition table header claims that the size of partition table
entries is 0 bytes, but this program supports only 128-byte entries.
Adjusting accordingly, but partition table may be garbage.
Warning: Partition table header claims that the size of partition table
entries is 0 bytes, but this program supports only 128-byte entries.
Adjusting accordingly, but partition table may be garbage.
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: not present
A pool created on single path devices with multipath enabled afterwards has this for the raw mapped device:
[root@storage-seq-1 ~]# blkid /dev/mapper/35000cca2be1ed890
/dev/mapper/35000cca2be1ed890: PTUUID="c445b2c8-c715-7943-ba9b-8fc7ee0f16b3" PTTYPE="gpt"
[root@storage-seq-1 ~]# gdisk -l /dev/mapper/35000cca2be1ed890
GPT fdisk (gdisk) version 1.0.3
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/mapper/35000cca2be1ed890: 42970644480 sectors, 20.0 TiB
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): C445B2C8-C715-7943-BA9B-8FC7EE0F16B3
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 42970644446
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 42970626047 20.0 TiB BF01 zfs-69c2fcb0ef85e748
9 42970626048 42970642431 8.0 MiB BF07
This seems to indicate that when creating a pool on multipath mapped devices, the disk isn't partitioned? I'm also wondering now if I take the drives in a pool created on multipath devices to a different server without multipath, will the pool import successfully?
It'd import fine.
AIUI, on Linux, if it recognizes that the thing it's using as a "disk" is a whole "disk", it will partition it, then hide the partition in the status output. (On FreeBSD, it just uses whatever you give it, up to and including an entire unpartitioned disk.)
So I would assume it's treating the multipath device as "not a whole real disk" for this purpose, and not partitioning it when you use it for create, then for import, it's finding the ZFS metadata on the "partition" device, since it doesn't know anything about the relationship between the arbitrarily named devices.
Why the naming is different, you'd have to examine the multipath code to figure out, I'd guess.
Thanks @rincebrain , I guess that makes sense to me. Looks like a backup/destroy pool/re-create pool/restore is my next step to get this server to behave like my other multipath servers/pools.
If that's your simplest option, maybe.
You could also do something like drop one or two disks, let the DHSs rebuild, then replace them with their whole multipathed versions, possibly using zpool labelclear
on them so it doesn't complain they were once in the pool in the interim.
Or just remake the pool.
But either way, hitting that with 100+ disks in the pool is going to be cumbersome. :(
Another thing to note, in the many reboots done while troubleshooting this, occasionally one or a handful of devices will fail to get the partition mapped by multipathd
, and the pool will show them as UNAVAIL
. A zpool replace poolname OLDID /dev/mapper/WWNAME
will replace the drive with itself and use the full device without partitioning, as happens when creating a pool on multipath mapped devices. I suppose given enough reboots or just detaching and replacing the partitioned devices with themselves would eventually get the pool to be like any other pool created on multipath mapped devices and it would import correctly without the need to map partitions.
Presumably related to this code: https://github.com/openzfs/zfs/blob/master/lib/libzutil/os/linux/zutil_device_path_os.c#L47-L71 that just adds a "1" in some cases.
Presumably related to this code: https://github.com/openzfs/zfs/blob/master/lib/libzutil/os/linux/zutil_device_path_os.c#L47-L71 that just adds a "1" in some cases.
This seems to be where the suffix strangeness comes from. For the 1200 drives we have the WWN based multipath names all end either in 0-9
or c
. I'm guessing this is a property of all WWNs? Curious as to why it matters to zfs in this bit of code though. Is this a rule meant for some other type of device?
Also does ZFS somehow trigger the mapping of these since it seems to also set the names?
From WIkipedia (https://en.wikipedia.org/wiki/World_Wide_Name):
Each WWN is an 8- or 16-byte number, the length and format of which is determined by the most significant four bits, which are referred to as an NAA (Network Address Authority). The remainder of the value is derived from an IEEE OUI (or from Company Id (CID)) and vendor-supplied information. Each format defines a different way to arrange and/or interpret these components.
My pattern of all WWNs ending in [0-9]
or c
must be a Western Digital/HGST format.
hello,
i have dug in git a little bit and this is some commits from 14 years ago
https://github.com/openzfs/zfs/commit/a2c6816c34952eb6dad51248d31172189fba9126 - Support shorthand names with zpool remove
https://github.com/openzfs/zfs/commit/83c62c939938ca5915a61022208a31c4ab3faa1c - Strip partition from device name for whole disks
/*
After poking this pool for several days, I think this isn't really a "bug", just a really confusing "feature". I don't understand why devices get treated differently between multipath and non-multipath, I would have assumed always using the device without partitioning would be better.
The TL;DR here is that if you want to have a pool be on multipath devices, create it on multipath devices. In my case I did not do that because I was waiting on an order of SAS cables that would allow me to cable for multipath and thought I'd get a head start getting data onto the system while waiting. Lesson learned, don't do that or if I ever do need to do that again, I'll figure out how to force everything to be mapped by multipath even if just single paths.
From my perspective (user who knows nothing about ZFS internals) this can be closed as I don't see anything to fix here other than it'd have been nice to read this in some docs at some point.
System information
Describe the problem you're observing
After enabling multipath with "use_friendly_names no" in multipath.conf devices whose WWN ends in a "c" now show an extra "1" in the output of zpool status.
Before enabling multipath,
After enabling multipath,
Describe how to reproduce the problem
On a system with an existing zpool enable multipath via,
Include any warning/errors/backtraces from the system logs
Example:
-->