opensvc / multipath-tools

Other
60 stars 49 forks source link

multipathd crash when stopping #1

Closed hexiaowen closed 3 years ago

hexiaowen commented 3 years ago

(gdb) bt

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51

1 0x0000ffff87d9e81c in __GI_abort () at abort.c:79

2 0x0000ffff87dd7818 in __libc_message (action=action@entry=do_abort,

fmt=fmt@entry=0xffff87e97888 "%s\n") at ../sysdeps/posix/libc_fatal.c:181

3 0x0000ffff87dddf6c in malloc_printerr (

str=str@entry=0xffff87e950d0 "free(): invalid pointer") at malloc.c:5389

4 0x0000ffff87ddf780 in _int_free (av=0xffff87ed7a58 , p=0xffff80000070,

have_lock=0) at malloc.c:4172

5 0x0000ffff880f55a8 in internal_hashmap_clear (h=h@entry=0xffff80027980,

default_free_key=<optimized out>, default_free_value=<optimized out>)
at ../src/basic/hashmap.c:902

6 0x0000ffff880f56a0 in internal_hashmap_free (h=,

default_free_key=<optimized out>, default_free_value=<optimized out>,
default_free_value=<optimized out>, default_free_key=<optimized out>, h=<optimized out>)
at ../src/basic/hashmap.c:874

7 0x0000ffff880f582c in ordered_hashmap_free_free_free () at ../src/basic/hashmap.h:118

8 device_free (device=0xffff80027820) at ../src/libsystemd/sd-device/sd-device.c:68

9 sd_device_unref (p=) at ../src/libsystemd/sd-device/sd-device.c:78

10 0x0000ffff88100978 in sd_device_unrefp () at ../src/systemd/sd-device.h:118

11 device_new_from_nulstr (len=, nulstr=0xffff877f93d0 "",

ret=<synthetic pointer>) at ../src/libsystemd/sd-device/device-private.c:448

12 device_monitor_receive_device (m=0xffff80000b20, ret=ret@entry=0xffff877fb388)

at ../src/libsystemd/sd-device/device-monitor.c:447

13 0x0000ffff881028a4 in udev_monitor_receive_sd_device (ret=0xffff877fb388,

udev_monitor=0xffff80000c70) at ../src/libudev/libudev-monitor.c:207

14 udev_monitor_receive_device (udev_monitor=0xffff80000c70,

udev_monitor@entry=0xffff877fb3a0) at ../src/libudev/libudev-monitor.c:253

15 0x0000ffff881a3478 in uevent_listen (udev=0xffff877fbf40) at uevent.c:853

16 0x0000aaaadc524514 in ueventloop (ap=0xffffc4134bd0) at main.c:1518

17 0x0000ffff880827ac in start_thread (arg=0xffff8821e380) at pthread_create.c:486

18 0x0000ffff87e3c47c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Multipathd has produced almost the same call stack twice. The udev API is suspected at first. However, hashmap is a common data structure of systemd. Systemd has never had the same call stack, Can someone help me?

In the test case, run the kill -9 multipathd command repeatedly and then restart the system. Check whether the function is normal.

hexiaowen commented 3 years ago

There's a strange phenomenon here. In frame 11, nulstr=0xffff877f93d0 "", But in frame 12,

x/32bs (uint8_t*) &buf.raw[bufpos]
0xffff877f9360: "ACTION"
0xffff877f9367: "change"
0xffff877f936e: "DEVPATH"
0xffff877f9376: "/devices/virtual/block/dm-69"
0xffff877f9393: "SUBSYSTEM"
0xffff877f939d: "block"
0xffff877f93a3: "DM_COOKIE"
0xffff877f93ad: "23068672"
0xffff877f93b6: "DEVNAME"
0xffff877f93be: "/dev/dm-69"
0xffff877f93c9: "DEVTYPE"
0xffff877f93d1: "disk"
0xffff877f93d6: "SEQNUM"
0xffff877f93dd: "14437"
0xffff877f93e3: "USEC_INITIALIZED"
0xffff877f93f4: "8213096220"
0xffff877f93ff: "MAJOR"
0xffff877f9405: "253"
0xffff877f9409: "MINOR"
0xffff877f940f: "69"
0xffff877f9412: "DM_UDEV_DISABLE_LIBRARY_FALLBACK_FLAG"
0xffff877f9438: "1"
0xffff877f943a: "DM_UDEV_PRIMARY_SOURCE_FLAG"
0xffff877f9456: "1"
0xffff877f9458: "DM_SUBSYSTEM_UDEV_FLAG0"
0xffff877f9470: "1"
0xffff877f9472: "DM_ACTIVATION"
0xffff877f9480: "0"
0xffff877f9482: "DM_NAME"
0xffff877f948a: "36e02861100592fcc99ad3c3800000195"
0xffff877f94ac: "DM_UUID"
0xffff877f94b4: "mpath-36e02861100592fcc99ad3c3800000195"
mwilck commented 3 years ago

As noted on dm-devel, could you check if it helps to disable pthread_cancel() while calling udev_monitor_receive_device()?

I don't think libudev is generally safe to be used in multithreaded programs. We're not aware of any issues, but this might be one.

lixiaokeng commented 3 years ago

It is helpful to disable pthread_cancel() while calling udev_monitor_receive_device(). Please privide a patch. Thanks.

mwilck commented 3 years ago

This is a major change in multipath-tools, and can't be rushed. I've been sick lately and not been able to work on it. Please explore if you can't fix the issue in OpenEuler by just not using -fexceptions for libudev and libsystemd.

lixiaokeng commented 3 years ago

This is fixed by not using -fexceptions. Thanks!

mwilck commented 3 years ago

FTR, there was anothre issue, fixed with https://github.com/openSUSE/multipath-tools/commit/38ffd890aaeace8a6909f5685d3394e8cfe3b975 from https://github.com/openSUSE/multipath-tools/tree/queue.

mwilck commented 3 years ago

I believe this issue can be closed.

mwilck commented 3 years ago

@cvaroqui, would you mind closing this issue?