Closed mfspeer closed 9 years ago
Here's the stack trace from my crash:
[New process 93495 ]
#0 0x0806d498 in add_jp_entry (pim_nbr=0x808c248, holdtime=210,
group=33620448, grp_msklen=32 ' ', source=1685262346, src_msklen=32 ' ',
addr_flags=0, join_prune=2 '\002') at pim_proto.c:2138
warning: Source file is more recent than executable.
2138 break;
(gdb) print *pim_nbr
$1 = {next = 0x0, prev = 0x106, address = 134587968, vifi = 0, timer = 0,
build_jp_message = 0x1}
Current language: auto; currently minimal
(gdb) $c
Undefined command: "$c". Try "help".
(gdb) where
#0 0x0806d498 in add_jp_entry (pim_nbr=0x808c248, holdtime=210,
group=33620448, grp_msklen=32 ' ', source=1685262346, src_msklen=32 ' ',
addr_flags=0, join_prune=2 '\002') at pim_proto.c:2138
#1 0x080643ba in age_routes () at timer.c:713
#2 0x0805a64d in timer (i=0x0) at main.c:675
#3 0x0806014f in age_callout_queue (elapsed_time=0) at callout.c:94
#4 0x0805a5e7 in main (argc=0, argv=0x8047e50) at main.c:638
(gdb)
This looks more like memory corruption, but I could be wrong. Prev field of pim_nbr
seems to be corrupted.
Interesting. Did you have switches between PIMd routers or were they directly connected? How did you caused the link down? Was it sending side pimd that crashed or receiving side? Did the multicast flow recover to work through upper part of triangel (through RP) before crash or how quickly crash occurred.
It's the receiving side operating with shortest path on.
Really hope 69a5e34 fixes this bug once and for all! (If not, please reopen issue #22.)
Thanks for all the help debugging it!
I downloaded, compiled 2.2.0 and set it up in a triangle topology of 3 routers with the top node of the triangle configured as the RP. I start traffic and wait for traffic to switch to shortest path tree (first hop router and last hop router left and right hand nodes of the triangle respectively). I then simulate a link down event between the two nodes and I get same crash and stack trace previously reported for this issue: