opensvc / multipath-tools

Other
60 stars 48 forks source link

Strange `multipath -ll` output #12

Open jirib opened 3 years ago

jirib commented 3 years ago

I got this (multipath-tools-0.8.2+18.9ff73e7-lp152.2.1.x86 - OpenSUSE LEAP package on SLES 15 SP2; do not ask my why, it's not my system) output:

How should I interpret output for eg. 36006016022f0440013d6296107dfe3c7?

# /sbin/multipath -ll
Aug 30 05:09:18 | 65:16: cannot find block device
Aug 30 05:09:18 | 65:16: Empty device name
Aug 30 05:09:18 | 65:16: Empty device name
Aug 30 05:09:18 | 8:16: cannot find block device
Aug 30 05:09:18 | 8:16: Empty device name
Aug 30 05:09:18 | 8:16: Empty device name
Aug 30 05:09:18 | 8:128: cannot find block device
Aug 30 05:09:18 | 8:128: Empty device name
Aug 30 05:09:18 | 8:128: Empty device name
Aug 30 05:09:18 | 8:240: cannot find block device
Aug 30 05:09:18 | 8:240: Empty device name
Aug 30 05:09:18 | 8:240: Empty device name
36006016040203100d2fba8d4190de711 dm-6 DGC,VRAID
size=10G features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:1:1 sdg 8:96   active ready running
| `- 2:0:4:1 sdc 8:32   active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 1:0:5:1 sds 65:32  active ready running
  `- 2:0:5:1 sdu 65:64  active ready running
36006016022f04400e3e62561e0efe3a6 dm-12 ##,##
size=2.0G features='0' hwhandler='1 alua' wp=rw
36006016022f0440013d6296107dfe3c7 dm-11 ##,##
size=30G features='0' hwhandler='1 emc' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| |- #:#:#:# -   #:#    failed undef unknown
| `- #:#:#:# -   #:#    failed undef unknown
`-+- policy='service-time 0' prio=0 status=enabled
  |- #:#:#:# -   #:#    failed undef unknown
  `- #:#:#:# -   #:#    failed undef unknown
36006016022f0440035e225611ec1ff12 dm-14 ##,##
size=2.0G features='0' hwhandler='1 alua' wp=rw
36006016015f23e005d8cf247ff6fea11 dm-5 DGC,VRAID
size=300G features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:2:0 sde 8:64   active ready running
| `- 2:0:1:0 sdm 8:192  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 1:0:3:0 sdf 8:80   active ready running
  `- 2:0:2:0 sdn 8:208  active ready running
3600508b1001c3ace5200ce6cfbd2e788 dm-0 HP,LOGICAL VOLUME
size=279G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 0:1:0:0 sda 8:0    active ready running
36006016022f04400c0de256106664a4b dm-13 ##,##
size=2.0G features='0' hwhandler='1 alua' wp=rw

Is this a kind of bug or consequence of bizarre kernel multipath-tools combination?

bmarzins commented 3 years ago

These look like multipath devices where the path devices have been removed from the system. Multipathd will try to reload the devices as the paths are removed. When all the paths are removed, multipathd will try to remove the mulitpath device. If a device is in use, however, multipathd will not be able to remove it, and it only tries once automatically. Listings like this

36006016022f04400e3e62561e0efe3a6 dm-12 ##,##
size=2.0G features='0' hwhandler='1 alua' wp=rw

Look like they come from a multipath device that has no paths, but was not removed, because it was in use when multipathd tried (assuming that multipathd is actually running). Listings like this:

36006016022f0440013d6296107dfe3c7 dm-11 ##,##
size=30G features='0' hwhandler='1 emc' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| |- #:#:#:# -   #:#    failed undef unknown
| `- #:#:#:# -   #:#    failed undef unknown
`-+- policy='service-time 0' prio=0 status=enabled
  |- #:#:#:# -   #:#    failed undef unknown
  `- #:#:#:# -   #:#    failed undef unknown

Look like the kernel was unable for some reason to reload the device, or multipathd wasn't running to try the reload. The second option seems more likely, since multipathd should have tried to reload the device when the last path was removed, and reloading a multipath device with no paths is very unlikely to fail.

mwilck commented 3 years ago

Ack to everything @bmarzins said - there are maps in the system that have no valid devices, and couldn't be flushed, must probably because they were in use directly or indirectly.

@jirib, try if reloading multipath configuration with multipathd reconfigure or reloading the device configuration (e.g. with rescan-scsi-bus.sh -a -r) fixes the situation. multipathd logs taken with verbosity 3 might also shed some light. Unless you can provide more material proving that multipath is at fault here, I suggest closing this issue.

mwilck commented 3 years ago

@jirib, final call for additional information...

jirib commented 3 years ago

@jirib, final call for additional information...

It's ok. My main concern is that I could not find info which would explain such output. Maybe I missed some description in a man page or in a documentation?

mwilck commented 3 years ago

Right, describing this in more detail would be a todo item. Wanna give it a shot? :wink:

mwilck commented 3 years ago

Labelled as documentation - writing this would be a good starting point for someone acqainting himself with multipath-tools.