ocochard / BSDRP

BSD Router Project
https://bsdrp.net
Other
177 stars 38 forks source link

Kernel panic with multiple BGP neighbors #28

Closed ocochard closed 5 years ago

ocochard commented 5 years ago

2 users report this problem:

(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80406d6b in db_dump (dummy=<optimized out>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/ddb/db_command.c:574
#3  0xffffffff80406b39 in db_command (last_cmdp=<optimized out>, cmd_table=<optimized out>, dopager=1)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/ddb/db_command.c:481
#4  0xffffffff804068b4 in db_command_loop () at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/ddb/db_command.c:534
#5  0xffffffff80409aff in db_trap (type=<optimized out>, code=<optimized out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/ddb/db_main.c:252
#6  0xffffffff809ef774 in kdb_trap (type=3, code=0, tf=0xfffffe0076d97120)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/subr_kdb.c:693
#7  0xffffffff80de35dc in trap (frame=0xfffffe0076d97120) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:619
#8  <signal handler called>
#9  kdb_enter (why=0xffffffff8105668c "panic", msg=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/subr_kdb.c:479
#10 0xffffffff809a6611 in vpanic (fmt=<optimized out>, ap=0xfffffe0076d97290)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:866
#11 0xffffffff809a6433 in panic (fmt=0xffffffff81930338 <gdb_consdev> "\240\246\067\201\377\377\377\377\001")
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/kern_shutdown.c:804
#12 0xffffffff80de3a84 in trap_fatal (frame=0xfffffe0076d97490, eva=112)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:946
#13 0xffffffff80de3ae9 in trap_pfault (frame=0xfffffe0076d97490, usermode=0)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:765
#14 0xffffffff80de30ef in trap (frame=0xfffffe0076d97490) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/amd64/amd64/trap.c:441
#15 <signal handler called>
#16 rt_notifydelete (rt=0x0, info=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1251
#17 rtrequest1_fib (req=<optimized out>, info=0xfffffe0076d97700, ret_nrt=0xfffffe0076d977b8, fibnum=<optimized out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1566
#18 0xffffffff80ace58a in route_output (m=<optimized out>, so=<optimized out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/rtsock.c:723
#19 0xffffffff80a3aa6a in sosend_generic (so=0xfffff800063b5000, addr=0x0, uio=0xfffffe0076d97a50, top=0xfffff8000649d400, control=0x0,
    flags=<optimized out>, td=0xfffff80006e5e580) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/kern/uipc_socket.c:1582

So, if I understood correctly, the intersting part is here:

#15 <signal handler called>
#16 rt_notifydelete (rt=0x0, info=<optimized out>) at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1251
#17 rtrequest1_fib (req=<optimized out>, info=0xfffffe0076d97700, ret_nrt=0xfffffe0076d977b8, fibnum=<optimized out>)
    at /usr/local/BSDRP/BSDRPstable/FreeBSD/src/sys/net/route.c:1566

So function rtrequest1_fib called rt_notifydelete with a rt=0x0 (NULL) and this triggered the panic. But this should not be possible, because the rtrequest1_fib code is this one :

1560                 RIB_WLOCK(rnh);
1561                 rt = rt_unlinkrte(rnh, info, &error);
1562                 RIB_WUNLOCK(rnh);
1563                 if (error != 0)
1564                         return (error);
1565
1566                 rt_notifydelete(rt, info);

And the if (error !=0) should catch rt_unlinkrte() returning an error (then a NULL pointer). Is this possible than this pointer be changed to NULL (destroyed?) between the RIB_WUNLOCK(rnh) and the rt_notifydelete() call ?

Need to test by upgrading this FreeBSD-stable to r345764 that is fixing some locking.

ocochard commented 5 years ago

Problem was identified by melifaro@FreeBSD.org: It's the kernel "options RADIX_MPATH" (multipath routing). A new version without this option will be released.

ocochard commented 5 years ago

RADIX_MPATH kernel's option removed since BSDRP 1.93 (2019/05/30).