Closed jinleiw closed 3 years ago
This is not an error in multipath-tools, it looks like a kernel issue. dm-multipath should fail over to another PG, but instead it passes the error up to the filesystem, which shouldn't happen. What kernel are your running?
Could you enable SCSI logging before pulling the cable please?
# sysctl -w dev.scsi.logging_level=8192
Also, please run multipathd with "-v3" (or set verbosity 3
in multipath.conf) and provide the output.
It is a Fedora based distro (RHEL, Centos, Oracle, ....). And your config is totally wrong for this IBM array.
Do:
# save old configs
mv /etc/multipath.conf /etc/_multipath.conf-$(date +%s)
cp -a /etc/multipath/wwids /etc/multipath/_wwids-$(date +%s)
# reconfig mp
mpathconf --enable --user_friendly_names n
multipath -W
systemctl enable multipathd.service
If IBM/2145 is NOT present in the default config: # multipath -t
you must add this to /etc/multipath.conf :
devices {
device {
vendor "IBM"
product "^2145"
path_grouping_policy "group_by_prio"
prio "alua"
failback "immediate"
no_path_retry "queue"
}
}
And then:
# recreate initrd, and reboot the system
dracut -f
init 6
If IBM/2145 is NOT present in the default config:
# multipath -t
you must add this to /etc/multipath.conf :devices { device { vendor "IBM" product "^2145" path_grouping_policy "group_by_prio" prio "alua" failback "immediate" no_path_retry "queue" } }
^^ that seems wrong, see https://www.ibm.com/docs/en/flashsystem-v9000/8.2.x?topic=system-settings-linux-hosts and more importantly https://www.ibm.com/docs/en/flashsystem-v9000/8.2.x?topic=htrlos-attachment-requirements-hosts-that-are-running-linux-operating-system.
You have provided the same link twice, and the link is unrelated to multipath.conf
settings.
I can confirm that @xosevp's sample matches the default config built into multipath-tools for IBM 2145. @jinleiw's setting for "V7000" is ineffective. "V7000" may be the marketing name of your device, but what matters here is the product name that the device tells to host in the SCSI INQUIRY, which is 2145
. It's a very unfortunate habit of hardware vendors to sell products under names that are totally unrelated to the actual technical product name.
If IBM/2145 is NOT present in the default config:
That's quite unlikely, as the configuration for IBM 2145 has been unchanged in our code since 2016 (0.6.4).
It is a Fedora based distro
@jinleiw / @jirib, in general, if you have issues with the multipath versions shipped with your distribution, please use your distribution's support facilities rather than this upstream issue tracker.
If IBM/2145 is NOT present in the default config:
That's quite unlikely, as the configuration for IBM 2145 has been unchanged in our code since 2016 (0.6.4).
It is a Fedora based distro
@jinleiw / @jirib, in general, if you have issues with the multipath versions shipped with your distribution, please use your distribution's support facilities rather than this upstream issue tracker.
I only wanted to point out that IBM recommends different values that defaults in multipath-tools. (I updated the first link.) BTW, I don't think it makes sense to update multipath.conf to every HW vendor recommendations. People using such HW should first read HW vendor documentation, not just depends on mostly sane defaults.
Well, except for no_path_retry
, the settings are the same (some are missing above, but the defaults match IBM's recommendations). no_path_retry
is a setting that mostly depends on data center preferences. Yet it's interesting that they use 5 for every distro.
@xosevp, would you say that we should update our defaults?
BTW, I don't think it makes sense to update multipath.conf to every HW vendor recommendations.
At least it's needed for installation on multipath ROOT disks, or in systems rescue DVD-ROM/ISOs. And very often arrays docs disappear from the NET.
People using such HW should first read HW vendor documentation, not just depends on mostly sane defaults.
A lot of vendor's docs are out of date and sometimes totally wrong. The defaults in multipath-tools are based on vendor's recommendations, and provide a stable and performance setup.
A lot of vendor's docs are out of date and sometimes totally wrong. The defaults in multipath-tools are based on vendor's recommendations, and provide a stable and performance setup.
I agree. However, seeing a numeric value for no_path_retry
, I wonder if the vendor has spent some extra effort to determine a value that matches their hardware. It could be something like "typical time required for a storage node reboot" or something like that (even 25s seems a little low for that).
The OP's problem is a kernel issue, the rest is discussion about HW defaults, for which no ideal solution exists (IBM 2145 covers a wide range of devices with likely different characteristics). I suggest closing this issue.
Closing.
I use multipath for my storage array.
there are 4 path to the storage:
I unplug one of fibre to test failover, but I got the filesystem read-only, and found error message:
My conf is:
Maybe there is a bug, or my conf is error?