musl-libc: segfault when link-local address is listed as a client

jsarenik commented 3 years ago

According to suggestion in #157 I am making this fresh issue. This issue is related to musl-libc-only.

A minimal config file example:

interface eth0
{
   AdvSendAdvert on;
   prefix 2001:470:1f1b:365::/64
   {
     AdvOnLink on;
     AdvAutonomous on;
   };
   clients
   {
     fe80::aa20:66ff:fe3f:1909; # this is radvd server's link-local,
     # forgotten when the config was re-used from another host
     # though on glibc-based system it does not cause segfault
   };
};

The config file contains the link-local address of current host by error. It was forgotten there when the config file was transferred from another host that previously run radvd.

This is what happens:

# radvd -n -C /etc/radvd.conf -d5 -m stderr_clean
version 2.19 started
eth0 interface definition ok
config file, /etc/radvd.conf, syntax ok
...
eth0 linklocal address: fe80::aa20:66ff:fe3f:1909
eth0 address: OTHERADDRESS
eth0 address: fe80::aa20:66ff:fe3f:1909
eth0 is ready
sending RA to fe80::aa20:66ff:fe3f:1909 on eth0 (fe80::aa20:66ff:fe3f:1909), 3 options (using 64/1210 bytes)
eth0 next scheduled RA in 16 second(s)
polling for 16 second(s), next iface is eth0
Exiting, privsep_read_loop had readn return 0 bytes
Exiting, privsep_read_loop is complete.
Segmentation fault

robbat2 commented 3 years ago

Can do a debug build, and put that coredump through gdb for a backtrace?

jsarenik commented 3 years ago

@robbat2 Yes. I will wrte here as soon as it is ready.

jsarenik commented 3 years ago

@robbat2 Oops, do I need any options when running ./configure? So far I see only addresses in the core:

# gdb --core=core.10985 --batch                                    
[New LWP 10985]
Core was generated by `/usr/local/sbin/radvd -n -C /etc/radvd.conf -d5 -m stderr_clean'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1ea3b10edb in ?? ()

And it is current HEAD (b368cb9) compiled with -g (default).

EDIT: When using --core, also --exec needs to be set in order to see meaningful debugger messages.

jsarenik commented 3 years ago

$ file /usr/local/sbin/radvd
/usr/local/sbin/radvd: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, with debug_info, not stripped

jsarenik commented 3 years ago

My bad :) Works now:

# gdb --core=core.10985 --exec=/usr/local/sbin/radvd --batch       
[New LWP 10985]
Core was generated by `/usr/local/sbin/radvd -n -C /etc/radvd.conf -d5 -m stderr_clean'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f1ea3b10edb in memchr (src=src@entry=0xffffffffd078ef90, c=c@entry=0, n=n@entry=2147483647) at src/string/memchr.c:17
17  src/string/memchr.c: No such file or directory.

jsarenik commented 3 years ago

And here is the backtrace:

(gdb) bt
#0  0x00007f1ea3b10edb in memchr (src=src@entry=0xffffffffd078ef90, 
    c=c@entry=0, n=n@entry=2147483647) at src/string/memchr.c:17
#1  0x00007f1ea3b1198c in strnlen (
    s=s@entry=0xffffffffd078ef90 <error: Cannot access memory at address 0xffffffffd078ef90>, n=n@entry=2147483647) at src/string/strnlen.c:5
#2  0x00007f1ea3b0d6dc in printf_core (f=f@entry=0x7fffd078e8d8, 
    fmt=fmt@entry=0x562c994290fa "%s recvmsg len=%d", 
    ap=ap@entry=0x7fffd078e750, nl_arg=nl_arg@entry=0x7fffd078e7e0, 
    nl_type=<optimized out>) at src/stdio/vfprintf.c:594
#3  0x00007f1ea3b0daf1 in vfprintf (f=f@entry=0x7fffd078e8d8, 
    fmt=0x562c994290fa "%s recvmsg len=%d", ap=<optimized out>)
    at src/stdio/vfprintf.c:683
#4  0x00007f1ea3b0fbda in vsnprintf (s=<optimized out>, n=<optimized out>, 
    fmt=<optimized out>, ap=<optimized out>) at src/stdio/vsnprintf.c:54
#5  0x0000562c9941e5fa in ?? ()
#6  0x0000000000002000 in ?? ()
#7  0x00007f1ea3b52cfc in ?? () from /lib/ld-musl-x86_64.so.1
#8  0x0000000000000000 in ?? ()

richfelker commented 3 years ago

From the backtrace, this is obviously the result of calling a function without a declataion, which would have been caught with -Werror=implicit-function-declaration. The invalid pointer is a pointer that has been truncated to 32 bits with the upper bit sign-extended to 64 bits. So it looks like you're missing #include <net/if.h>.

And indeed, here is your bug:

https://github.com/radvd-project/radvd/blob/b368cb98da5da44154994de573fea24b7c7858fc/includes.h#L79-L84

It's explicitly doing the wrong thing on linux that's not glibc.

richfelker commented 3 years ago

I can't find anything in the tracker about why the above wrong change was made, so to fix this you'll probably have to figure out what prompted that and solve it the right way. Ping me if it's not clear what to do and I'll be happy to look at it.

rpodgorny commented 3 years ago

well, following the "git blame" leads to commit 46883f8a1a02fe42040dd8e48aec0ed871545d4d which has comment:

Allow building on musl Signed-off-by: Matthew Thode mthode@mthode.org

rpodgorny commented 3 years ago

so it seems matthew made it compile on musl but did not test it thoroughly during actual runtime...

jpds commented 3 years ago

I just patched out those if/else lines on my gentoo/musl install, left #include <net/if.h> in place and it compiled and starts up fine.

Will see if radvd segfaults in the coming days.

robbat2 commented 3 years ago

@prometheanfire ^^ this was your musl change, can you review and submit alternate options?

radvd-project / radvd

musl-libc: segfault when link-local address is listed as a client #158