Guanqinm commented 1 year ago

Hi Here we met a question. When we were testing the marginal parameters, a coredump was generated, which is shown below:

(gdb) bt

0 __pthread_kill_implementation (threadid=281473695391360, signo=signo@entry=6, no_tid=no_tid@entry=0)

at pthread_kill.c:44

1 0x0000ffffb5073304 in __pthread_kill_internal (signo=, threadid=)

at pthread_kill.c:78

2 0x0000ffffb502ed7c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26

3 0x0000ffffb501cd2c in __GI_abort () at abort.c:79

4 0x0000ffffb50673ec in __libc_message (action=action@entry=do_abort,

fmt=fmt@entry=0xffffb5141e20 "%s\n") at ../sysdeps/posix/libc_fatal.c:155

5 0x0000ffffb507d26c in malloc_printerr (str=str@entry=0xffffb513f0c8 "corrupted double-linked list")

at malloc.c:5671

6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637

7 0x0000ffffb507ec58 in _int_free (av=0xffffa0000030, p=0xffffa00012d0, have_lock=)

at malloc.c:4609

8 0x0000ffffb5081370 in GI_libc_free (mem=) at malloc.c:3393

9 0x0000ffffb5081424 in tcache_thread_shutdown () at malloc.c:3229

10 __malloc_arena_thread_freeres () at arena.c:1010

11 0x0000ffffb50833e8 in __libc_thread_freeres () at thread-freeres.c:44

12 0x0000ffffb50716a8 in start_thread (arg=0x0) at pthread_create.c:457

13 0x0000ffffb50d7d5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

(gdb) f 6

6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637

1637 malloc_printerr ("corrupted double-linked list"); (gdb) p p->fd $8 = (struct malloc_chunk ) 0xffffa0001ff0 (gdb) p p->fd->fd $9 = (struct malloc_chunk ) 0x0 (gdb) x/36xg 0xffffa0001ff0 0xffffa0001ff0: 0x0000000000000000 0x0000000000002c91 0xffffa0002000: 0x0000000000000000 0x0000000000000000 0xffffa0002010: 0x0000000000000000 0x0000000000000000 0xffffa0002020: 0x0000000000000000 0x0000000000000000 0xffffa0002030: 0x0000000000000000 0x0000000000000000 0xffffa0002040: 0x0000000000000000 0x0000000000000000 0xffffa0002050: 0x0000000000000000 0x0000000000000000 0xffffa0002060: 0x0000000000000000 0x0000000000000000 0xffffa0002070: 0x0000000000000000 0x0000000000000000 0xffffa0002080: 0x0000000000000000 0x0000000000000000 0xffffa0002090: 0x0000000000000000 0x0000000000000000 0xffffa00020a0: 0x0000000000000000 0x0000000000000000 0xffffa00020b0: 0x0000000000000000 0x0000000000000000 0xffffa00020c0: 0x0000000000000000 0x0000000000000000 0xffffa00020d0: 0x0000000000000000 0x0000000000000000 0xffffa00020e0: 0x0000000000000000 0x0000000000000000 0xffffa00020f0: 0x0000000000000000 0x0000000000000000 0xffffa0002100: 0x0000000000000000 0x0000000000000000 (gdb)

The logic of the test case is to detect whether the path is recovery after the path is set to marginal and the marginal_path_double_failed_time has elapsed, and then stop the multi-path guardian process.

When we examine the thread exit, the malloc_printerr function detects a linked list of memory that has been freed during the process's execution. It is found that this memory has been written with an abnormal value of 0, resulting an error. We don't know how to analyze this problem and we don't know where the memory is being written abnormally. Can you tell us how to analyze and solve this problem?

mwilck commented 1 year ago

malloc corruption is always difficult to debug. The easiest way is to run multipathd under valgrind, but that may be too slow to reproduce the error (just try).

If that doesn't work, you can try using the compiler flag -fsanitize=address, which may give you additional insight when the crash occurs.

mwilck commented 1 year ago

But first you may want to update multipath-tools and see if the error is still present. 0.8.7 is 2 years old.

lixiaokeng commented 12 months ago

This cause when multipathd exit. According to LWP, I guess it was when uvloop exited and released resources that the content corresponding to the chunk->fd address was abnormal.

8912 bytes area （0xffffa0002000 to 0xffffa0003f28） is zero in p->fd.

Is there any action like memset(p,0,xxx) (xxx may 4096 / 8192 or other) may lead this?

lixiaokeng commented 12 months ago

But first you may want to update multipath-tools and see if the error is still present. 0.8.7 is 2 years old.

There only cause once in 0.8.7. Now we use asan test but no progress yet

mwilck commented 12 months ago

Is there any action like memset(p,0,xxx) (xxx may 4096 / 8192 or other) may lead this?

It's possible if the memset would clear an area outside the allocated memory for p.

lixiaokeng commented 11 months ago

Here is some new info. This coredump cased when mutlipathd stop and after free_io_err_stat_path.

mwilck commented 11 months ago

Sorry, I can't parse your comment. Are you saying the core dump happened after the call to free_io_err_pathvec() in stop_io_err_stat_thread()?

I've just reviewed the locking in that code once more, and I found no obvious race condition, all accesses to io_err_path are properly protected with the io_err_pathvec_lock.

Question: Does this crash occur always? Does it only occur if there are actual marginal path events?

lixiaokeng commented 11 months ago

multipath.conf: defaults{ marginal_path_double_failed_time 30 marginal_path_err_sample_time 120 marginal_path_err_rate_threshold 10 marginal_path_err_recheck_gap_time 600 }

test.sh

while true do begin="$(date +'%Y-%m-%d %H:%M:%S')" echo $begin multipathd fail path /dev/sdc sleep 0.5 multipathd reinstate path /dev/sdc sleep 0.5 multipathd fail path /dev/sdc sleep 0.5 multipathd reinstate path /dev/sdc journalctl --since="${begin}" |grep "path is now marginal" while [ $? -ne 0 ] do sleep 0.1 journalctl --since="${begin}" |grep "path is now marginal" done systemctl restart multipathd sleep 3 done

This problem can be reproduced by this test.sh.

lixiaokeng commented 11 months ago

Sorry, I can't parse your comment. Are you saying the core dump happened after the call to free_io_err_pathvec() in stop_io_err_stat_thread()?

I've just reviewed the locking in that code once more, and I found no obvious race condition, all accesses to io_err_path are properly protected with the io_err_pathvec_lock.

Yes， I don't see any competition either.

Question: Does this crash occur always? Does it only occur if there are actual marginal path events? Yes，it only occur if there are actual marginal path events.

I add printing p->dio_ctx_array->buf in free_io_err_stat_path before and after free(). I find the buf is freed and the content is malloc_chunk->fd. But buf is zero in coredump.

lixiaokeng commented 11 months ago

I think the reason is that io_cancel fails and the buf is UAF by io in kernel. Do you think it is a good idea move pthread_testcancel after io_getevents in process_async_ios_event?

mwilck commented 11 months ago

AFAIR, io_cancel() always fails; even if the request can be cancelled (which is almost never the case), it returns -EINPROGRESS. It's possible for a user space application to drop the aio buffers before the aio finishes. In that case, anything can happen; the kernel will write to free()d memory, overwriting malloc meta data.

IMO the bottom line is that you can't free aio memory unless the kernel before the kernel has completed them.

Wrt pthread_testcancel(), it might be a good idea to comment out calls to this function in the io_err_stat code altogether for testing.

mwilck commented 11 months ago

Please check the patch from this draft PR. It's against my latest code base, but you should be able to apply it on 0.8.7, too.

lixiaokeng commented 11 months ago

I review the code and I think it can solve the problem. But The National Day holiday is coming. I will test it after holiday.

Guanqinm commented 11 months ago

We have verified the patch from this draft PR in the environment and confirmed that the patch can solve the problem.

mwilck commented 11 months ago

Thanks, I'll make an official submission asap.

mwilck commented 11 months ago

The directio checker has the same problem (even though we've seen no crashes there). I made a patch for it, too, but now the unit tests are failing :-/ I need to have another look.

mwilck commented 11 months ago

@Guanqinm, @lixiaokeng: I just realize that there might be a simpler solution to this. Please revert the previous patch and just try this instead:

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index c474c34..1982915 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
                pthread_cancel(io_err_stat_thr);

        pthread_join(io_err_stat_thr, NULL);
-       free_io_err_pathvec();
        io_destroy(ioctx);
+       free_io_err_pathvec();
 }

The io_destroy(2) man page says:

"The io_destroy() system call will attempt to cancel all outstanding asynchronous I/O operations against ctx_id, will block on the completion of all operations that could not be canceled, and will destroy the ctx_id."

Thus after calling io_destroy(), it should be impossible that any outstanding I/O completes and overwrites application memory.

mwilck commented 11 months ago

Pulling in also @bmarzins, as he has done most of the aio work for the directio checker. I have experimented a bit.

it's definitely wrong to free any iocb resources after io_cancel(), even if it succeds (which will not with recent kernels, it always returns -EINPROGRESS).
the kernel can't cancel an iocb once submitted. So if the device doesn't respond, all we can do is wait for the request to eventually time out, and error handling to finish. The alternative is to just forget about the allocated memory and create a new request.
io_destroy() actually waits for all outstanding IO to finish. Memory can be freed after it returns. The downside of this is that io_destroy() may hang for a long time. Some time ago we have put considerable effort in an attempt to make multipathd shut down quickly; if io_destroy() hangs, this won't work any more. OTOH, if we don't call io_destroy(), the kernel will do it when our main process exits, but by that time we'd have released all our other resources (in particular the unix socket), and a new multipathd instance could be started, which would likely not work if io_destroy() was hung before the socket was freed.
The directio checker calls io_destroy() during runtime if a certain number of requests have stalled, which may cause the checker to block for a considerable amount of time. This kind of contradicts the purpose of using aio for the checker.

Guanqinm commented 11 months ago

We have verified the patch and confirmed that the patch can solve this issue. @mwilck

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index c474c34..1982915 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
                pthread_cancel(io_err_stat_thr);

        pthread_join(io_err_stat_thr, NULL);
-       free_io_err_pathvec();
        io_destroy(ioctx);
+       free_io_err_pathvec();
 }

mwilck commented 11 months ago

Thanks! The trivial patch alone is not correct by itself because the code would still re-use iocbs before they have completed. But it clarifies things.

Guanqinm commented 11 months ago

Hello, may I ask if the repair patch is the same as the previous patch (this draft PR)？

mwilck commented 11 months ago

I haven't finished this yet. For now you can use the "draft PR", or a combination of both patches.

mwilck commented 10 months ago

I can't finish my bigger project right now. I have sent a small patch set based on the patches from this issue to dm-devel.

lixiaokeng commented 10 months ago

libmultipath: reduce log level of directio messages libmultipath: directio: don't reset ct->running after io_cancel() libmultipath: io_err_stat: don't free aio memory before completion libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec() multipath-tools: systemd: require modprobe@dm_multipath.service libmpathutil: remove systemd_service_enabled()

Are these patches intended to fix this issue?

mwilck commented 10 months ago

Yes. It's the same patches that you verified here. I just added the directio patches which fix the same issue for the directio checker.

mwilck commented 10 months ago

You're of course very welcome to test that patch set in your environment.

mwilck commented 10 months ago

@lixiaokeng @Guanqinm: as you saw I posted an updated patch series to the dm-devel ML yesterday. Test / feedback in your test bed would be highly appreciated.

lixiaokeng commented 10 months ago

@lixiaokeng @Guanqinm: as you saw I posted an updated patch series to the dm-devel ML yesterday. Test / feedback in your test bed would be highly appreciated.

I have give a test.sh. Do you have test it？We have test "libmultipath: io_err_stat: don't free aio memory before completion" and "libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec()". There is no coredump. I think our test can't test other patches.

opensvc / multipath-tools

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

0 __pthread_kill_implementation (threadid=281473695391360, signo=signo@entry=6, no_tid=no_tid@entry=0)

1 0x0000ffffb5073304 in __pthread_kill_internal (signo=, threadid=)

2 0x0000ffffb502ed7c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26

3 0x0000ffffb501cd2c in __GI_abort () at abort.c:79

4 0x0000ffffb50673ec in __libc_message (action=action@entry=do_abort,

5 0x0000ffffb507d26c in malloc_printerr (str=str@entry=0xffffb513f0c8 "corrupted double-linked list")

6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637

7 0x0000ffffb507ec58 in _int_free (av=0xffffa0000030, p=0xffffa00012d0, have_lock=)

8 0x0000ffffb5081370 in GI_libc_free (mem=) at malloc.c:3393

9 0x0000ffffb5081424 in tcache_thread_shutdown () at malloc.c:3229

10 __malloc_arena_thread_freeres () at arena.c:1010

11 0x0000ffffb50833e8 in __libc_thread_freeres () at thread-freeres.c:44

12 0x0000ffffb50716a8 in start_thread (arg=0x0) at pthread_create.c:457

13 0x0000ffffb50d7d5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637

opensvc / multipath-tools

0.8.7:test marginal_path_double_failed_time parameter generated a coredump #73

0 __pthread_kill_implementation (threadid=281473695391360, signo=signo@entry=6, no_tid=no_tid@entry=0)

1 0x0000ffffb5073304 in __pthread_kill_internal (signo=, threadid=)

2 0x0000ffffb502ed7c in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26

3 0x0000ffffb501cd2c in __GI_abort () at abort.c:79

4 0x0000ffffb50673ec in __libc_message (action=action@entry=do_abort,

5 0x0000ffffb507d26c in malloc_printerr (str=str@entry=0xffffb513f0c8 "corrupted double-linked list")

6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637

7 0x0000ffffb507ec58 in _int_free (av=0xffffa0000030, p=0xffffa00012d0, have_lock=)

8 0x0000ffffb5081370 in __GI___libc_free (mem=) at malloc.c:3393

9 0x0000ffffb5081424 in tcache_thread_shutdown () at malloc.c:3229

10 __malloc_arena_thread_freeres () at arena.c:1010

11 0x0000ffffb50833e8 in __libc_thread_freeres () at thread-freeres.c:44

12 0x0000ffffb50716a8 in start_thread (arg=0x0) at pthread_create.c:457

13 0x0000ffffb50d7d5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

6 0x0000ffffb507db44 in unlink_chunk (p=p@entry=0xffffa00012d0, av=0xffffa0000030) at malloc.c:1637

8 0x0000ffffb5081370 in GI_libc_free (mem=) at malloc.c:3393