kannankvs commented 4 years ago

Description
Zebra crash is seen upon deleting mvrf. From the GDB traces, the route node does not have valid data and the zebra process is crashed while trying to access NULL/Invalid rnode->table pointer. Various methods have been experimented to recreate the issue using linux commands in a script (without using management vrf commands) to isolate the problem. From those methods, it is clear that crash is observed with simple linux scripts as well.

Create VRF - Creation cgroup , eth0 ,mgmt vrf link
Linux commands (ran in script): cgcreate -g l3mdev:mgmt cgset -r l3mdev.master-device=mgmt mgmt ip link add dev mgmt type vrf table 5000 ip link set mgmt up ip link set dev eth0 master mgmt

Delete VRF - Delete eth0,mgmt vrf link
ip link set dev eth0 nomaster ip link delete dev mgmt

Steps to reproduce the issue:
Method1 config vrf add mgmt
config vrf del mgmt root@sonic-z9264f-03:/var/core# ls -l -rw-rw-rw- 1 root root 3509004 Nov 3 12:41 zebra.1572784872.36.core.gz

Method2 As explained earlier, using those linux commands in two scripts. Script1 for creating mvrf and Script2 for deleting mvrf.

Describe the results you received:
On debugging, got the following callback trace in zebra where rn->table is NULL/Invalid. GDB back trace:

0 0x00007f5f7062bfff in raise () from /lib/x86_64-linux-gnu/libc.so.6

[Current thread is 1 (Thread 0x7f5f71a7ed40 (LWP 43))] (gdb) bt

0 0x00007f5f7062bfff in raise () from /lib/x86_64-linux-gnu/libc.so.6

1 0x00007f5f7062d42a in abort () from /lib/x86_64-linux-gnu/libc.so.6

2 0x00007f5f7167939f in core_handler (signo=11, siginfo=0x7ffd2896ceb0,

context=<optimized out>) at lib/sigevent.c:228

3 4 srcdest_rnode_table (rn=) at ./lib/srcdest_table.h:90

5 rib_dest_table (dest=0x557be2369110) at zebra/rib.h:471

6 rib_dest_vrf (dest=0x557be2369110) at zebra/rib.h:479

7 netlink_route_info_fill (re=0x0, dest=0x557be2369110, cmd=25,

ri=0x7ffd2896d230) at zebra/zebra_fpm_netlink.c:294

8 zfpm_netlink_encode_route (cmd=25, dest=dest@entry=0x557be2369110,

re=re@entry=0x0, in_buf=in_buf@entry=0x557be2307c64 "0", in_buf_len=8188)
at zebra/zebra_fpm_netlink.c:572

9 0x00007f5f6e5ad8ed in zfpm_encode_route (msg_type=,

in_buf_len=<optimized out>, in_buf=0x557be2307c64 "0", re=<optimized out>, 
dest=0x557be2369110) at zebra/zebra_fpm.c:887

10 zfpm_build_route_updates () at zebra/zebra_fpm.c:990

11 zfpm_build_updates () at zebra/zebra_fpm.c:1149

12 zfpm_write_cb (thread=0x7ffd289714c0) at zebra/zebra_fpm.c:1187

13 0x00007f5f71686a50 in thread_call (thread=thread@entry=0x7ffd289714c0)

at lib/thread.c:1531

14 0x00007f5f71656fa8 in frr_run (master=0x557be212c9c0) at lib/libfrr.c:1054

15 0x0000557be038c9b3 in main (argc=9, argv=0x7ffd289718c8)

Describe the results you expected: Expected zebra not to crash.

Additional information you deem important (e.g. issue happens only occasionally):
When the linux script is not giving the crash, few additional linux commands in create and delete will help sometimes. Additional Commands in Create script: cgcreate -g l3mdev:mgmt cgset -r l3mdev.master-device=mgmt mgmt

Additional Commands in Delete script: cgdelete -g l3mdev:mgmt

**Output of `show version`:**

Any master image.

prsunny commented 4 years ago

Fixed as part of https://github.com/Azure/sonic-buildimage/pull/3763 Please use the latest VS image

kannankvs commented 4 years ago

@prsunny : Before raising this gitissue, we had integrated this PR in our code and found that the crash exists even with this fix. Once if we see a successful Jenkins image, we will retest it using Jenkins image and update you.

chitra-raghavan commented 4 years ago

Zebra crash is seen in #136 as well. root@sonic-z9264f-03:~# show mgmt

ManagementVRF : Enabled

Management VRF interfaces in Linux:
137: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ba:b0:2d:ed:d8:32 brd ff:ff:ff:ff:ff:ff promiscuity 0
    vrf table 5000 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master mgmt state UP mode DEFAULT group default qlen 1000
    link/ether 20:04:0f:3e:ff:49 brd ff:ff:ff:ff:ff:ff
139: lo-m: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master mgmt state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether de:23:bd:5c:53:6f brd ff:ff:ff:ff:ff:ff
root@sonic-z9264f-03:~#
root@sonic-z9264f-03:~# ls -l /var/core
total 0
root@sonic-z9264f-03:~#
root@sonic-z9264f-03:~#
root@sonic-z9264f-03:~# config vrf del mgmt
root@sonic-z9264f-03:~#
root@sonic-z9264f-03:~#
root@sonic-z9264f-03:/var/core# ls -l
total 3376
-rw-rw-rw- 1 root root 3454852 Nov 25 07:25 zebra.1574666723.39.core.gz
root@sonic-z9264f-03:/var/core#

root@sonic-z9264f-03:~# show ver

SONiC Software Version: SONiC.HEAD.136-dirty-20191124.063021
Distribution: Debian 9.11
Kernel: 4.9.0-9-2-amd64
Build commit: 45e13b99
Build date: Sun Nov 24 06:44:25 UTC 2019
Built by: johnar@jenkins-worker-11

Platform: x86_64-dellemc_z9264f_c3538-r0
HwSKU: DellEMC-Z9264f-C8D112
ASIC: broadcom
Serial Number: CN030K8MDND0093E0004
Uptime: 07:22:43 up 2 min,  1 user,  load average: 2.76, 1.91, 0.79

Docker images:
REPOSITORY                 TAG                              IMAGE ID            SIZE
docker-syncd-brcm          HEAD.136-dirty-20191124.063021   7b8636f11261        392MB
docker-syncd-brcm          latest                           7b8636f11261        392MB
docker-platform-monitor    HEAD.136-dirty-20191124.063021   e8677506e2e9        329MB
docker-platform-monitor    latest                           e8677506e2e9        329MB
docker-fpm-frr             HEAD.136-dirty-20191124.063021   29cf3340ac1e        321MB
docker-fpm-frr             latest                           29cf3340ac1e        321MB
docker-lldp-sv2            HEAD.136-dirty-20191124.063021   4858cbda2d0a        299MB
docker-lldp-sv2            latest                           4858cbda2d0a        299MB
docker-dhcp-relay          HEAD.136-dirty-20191124.063021   71d699dde1c0        289MB
docker-dhcp-relay          latest                           71d699dde1c0        289MB
docker-database            HEAD.136-dirty-20191124.063021   9ab3fbf18464        281MB
docker-database            latest                           9ab3fbf18464        281MB
docker-sflow               HEAD.136-dirty-20191124.063021   59426ace2a6a        305MB
docker-sflow               latest                           59426ace2a6a        305MB
docker-teamd               HEAD.136-dirty-20191124.063021   848067f09a7d        304MB
docker-teamd               latest                           848067f09a7d        304MB
docker-snmp-sv2            HEAD.136-dirty-20191124.063021   a20666e82d68        335MB
docker-snmp-sv2            latest                           a20666e82d68        335MB
docker-orchagent           HEAD.136-dirty-20191124.063021   f8ae3d721c51        322MB
docker-orchagent           latest                           f8ae3d721c51        322MB
docker-sonic-telemetry     HEAD.136-dirty-20191124.063021   e5cca007df02        304MB
docker-sonic-telemetry     latest                           e5cca007df02        304MB
docker-router-advertiser   HEAD.136-dirty-20191124.063021   d774696476f2        281MB
docker-router-advertiser   latest                           d774696476f2        281MB

kannankvs commented 4 years ago

When VRF is getting deleted, kernel sends a netlink for "vrf deletion" and it sends one other netlink for "link deletion" (the link whose name is same as vrf name). Looks like the zebra is not handling the deletion of vrf followed by deletion of the link.

prsunny commented 4 years ago

@tylerlinp , can you please check this?

ryan44guo commented 4 years ago

We reproduced it. It still came from the frr issue 5369, which tylerlinp brought out. The root cause is when vrf removed, In zebra, the remained routes in this vrf route table is freed without removing from the fpm_q. When we remove a vrf, kernel remove all routes about it, but only IPV6 routes send out DELROUTE message. By the way, after kernel 4.11, there is a "skip" option to do not send out DELROUTE as IPV4 routes. Our modification in #3763 can only make this DELROUTE work correctly. But in your case, a 0.0.0.0/0 route is created by kernel(from the frr‘s point), so no DELROUTE message here. Someone is working on this bug. It may be a big change, I think we should wait for frr's workaround. Maybe 2 way for us to avoid this problem temporarily.

Do not add 0.0.0.0/0 from kernel, in our design, static route should come from frr. Or remove this route from kernel manually before vrf delete.
Add patch to nexthop_active_check function, make kernel route which nexthop is this interface deleted if interface's vrf changed. This patch may failed if an interface leave a vrf and rejoin the same vrf in a short time(i.e. not more than 50ms), But I think we can ignore this condition in our SONiC.

tylerlinp commented 4 years ago

Yes, crash is because issue frr#5369(bug 1). But it is different #3763 that only fixed delroute failure(bug 2). There should be bug 3 in frr7.2, that is when downing interface the routes pointing to it didn't clear(I think frr7.1 maybe do right), such as default route here existing until vrf mgmt removed.

I found 2 other issues when reading scripts. a) in interfaces.j2, lo-m create/remove in lo up/down. I think it is wrong, it should do in mgmt and should do set master first to avoid vrf moving. b) in interfaces-config.sh, down eth0 and lo. I think it is useless and that has been done in systemctl restart networking.

kannankvs commented 4 years ago

@ryan44guo , @prsunny : Regarding temporary workaround/solution # 1 to remove this route from kernel manually, I am not sure what is missing. As per the current sequence of operations, following things are already happening. 1) Before mvrf is created, 0.0.0.0/0 is getting deleted from the routing table "default" (this is done using "ifdown --force eth0" that happens from interfaces-config.sh). It is getting added to the table "5000" after networking service restarts. 2) Similarly, before mvrf is deleted, 0.0.0.0/0 is getting deleted from table "5000". It is getting added to table "default" after networking service restarts. Let me know if any other information is required on this.

kannankvs commented 4 years ago

@tylerlinp : W.r.t. crash, can we assume that the crash will be fixed irrespective of the 2 other issues that you mentioned?

Regarding the 2 other issues, please see my inline replies. a) lo-m create/remove in lo up/down. I think it is wrong, it should do in mgmt and should do set master first to avoid vrf moving. <> lo-m is kept inside lo to keep the related interfaces together. Since lo-m is a dummy loopback interface, it is kept under lo. What issue do you foresee in this? Regarding "set master first to avoid vrf moving", can you please explain? When VRF is getting created, when networking restart happens, linux will bring UP the lo interface; while bringing up the "lo" interface, it will create the "lo-m" interface and sets the master as "mgmt". Do you see any issue in this? Similarly, when VRF is deleted, networking restart will bring down all interfaces including "lo". When "lo" goes down, it will delete the "lo-m". Do you see any issue in this? By the by, we had already validated that zebra crash is observed even without having "lo-m" in the system. We had provided the simple scripts to simulate the crash, where there is no "lo-m".

b) in interfaces-config.sh, down eth0 and lo. I think it is useless and that has been done in systemctl restart networking. <> This piece of code was already there in interfaces-config.sh even before mvrf was added. I assume that there will be some reasons/advantages in having them. When mvrf is present, this "ifdown eth0" will help to have deterministic behavior w.r.t. sequence of operations. i.e. It forces the ifdown for eth0 first. When we do "systemctl restart networking", I think that linux will internally do "ifdown" for all interfaces, but the sequence of interfaces may (not sure) depend on the sequence in which the interfaces are specified in /etc/network/interfaces file. Instead, if "ifdown eth0" happens before restarting networking, it will ensure that all commands given in "down" rules for eth0 will be executed first. For example, all those default route deletion commands will be executed first. This ensures that the required route cleanup happens before new set of interfaces are created using networking restart. Do you see any issue in having these lines? By the by, we had already confirmed that zebra crash happens even with simple scripts where the "interfacs-config.sh" is not used.

tylerlinp commented 4 years ago

@kannankvs Yes, the crash will be fixed irrespective of the 2 other issues that I mentioned. To avoid crash, we should assure to delete routes first then to down eth0. In interfaces.j2, change down to pre-down would work.

    # management port down rules
    pre-down ip {{ '-4' if prefix | ipv4 else '-6' }} route delete default via {{ MGMT_INTERFACE[(name, prefix)]['gwaddr'] }} dev eth0 table {{ vrf_table }}
    pre-down ip {{ '-4' if prefix | ipv4 else '-6' }} route delete {{ prefix | network }}/{{ prefix | prefixlen }} dev eth0 table {{ vrf_table }}
    pre-down ip {{ '-4' if prefix | ipv4 else '-6' }} rule delete from {{ prefix | ip }}/{{ '32' if prefix | ipv4 else '128' }} table {{ vrf_table }}
{% if (MGMT_VRF_CONFIG) and (MGMT_VRF_CONFIG['vrf_global']['mgmtVrfEnabled'] == "true") %}
    down cgdelete -g l3mdev:mgmt
{% endif %}
{% for route in MGMT_INTERFACE[(name, prefix)]['forced_mgmt_routes'] %}
    pre-down ip rule delete to {{ route }} table {{ vrf_table }}
{% endfor %}

others:

a) At least set master before add ip, or else lo-m add a global ip then move vrf.

    up ip link add lo-m type dummy
    up ip link set dev lo-m master mgmt
    up ip addr add 127.0.0.1/8 dev lo-m
    up ip link set lo-m up

I think lo-m should create with mgmt rather than lo up, lo-m and lo has no relation.

b) You are right. It maybe useful to ifdown eth0 first.

tylerlinp commented 4 years ago

c) What is the meaning to add a directly connected route in interfaces.j2? up ip {{ '-4' if prefix | ipv4 else '-6' }} route add {{ prefix | network }}/{{ prefix | prefixlen }} dev eth0 table {{ vrf_table }}

kannankvs commented 4 years ago

@tylerlinp : Couple of doubts. 1) w.r.t "pre-down" suggestion, do we need to do this if the zebra crash is fixed? I think that zebra should work even without this "pre-down" change and hence this change is not a mandatory change. If the fix for zebra crash is going to take longer time, we can do this "pre-down" change as a temporary workaround. Reason for my point is that "down" was the solution that already existing even before mvrf got implemented; it means that mvrf implementation is following the solution that already existed. If we need to make this "pre-down" change, we may need to get the approval from the original author of that code as well. What do you say? 2) w.r.t your suggestion about lo-m, is there an issue due to the order in which "master" is being set? With the existing order, what issue are you observing? Is the existing sequence creating any issue in the zebra? In principle, any user can execute these linux commands in any order and the applications need to handle them. If there are any issues with the existing sequence, we shall go ahead and change the sequence like you suggest. What do you say? 3) Regarding moving "lo-m" under "mgmt", do you see any issue in keeping it under "lo"?

NOTE: Now that the 201911 branch has been pulled out, let us try to do the changes that are mandatory for the release. If you think that any of the above changes are mandatory, request you to provide more reasons for the change which can be produced while raising PR.

kannankvs commented 4 years ago

@tylerlinp : regarding the "connected" routes, we followed the code that already existed before mvrf implementation. It was doing "ip -4 route add 100.104.47.0/24 dev eth0 table default", where it was adding this connected route in table "default". Kernel was already adding this route in table "main" (and "local"). But, this command is adding the route in another routing table "default" and it adds an ip rule in the end to lookup this table. I am not sure about the original reason behind this. We followed the existing code for management VRF also. May be @prsunny might be able to get an answer for this.

tylerlinp commented 4 years ago

@tylerlinp : Couple of doubts.

w.r.t "pre-down" suggestion, do we need to do this if the zebra crash is fixed? I think that zebra should work even without this "pre-down" change and hence this change is not a mandatory change. If the fix for zebra crash is going to take longer time, we can do this "pre-down" change as a temporary workaround. Reason for my point is that "down" was the solution that already existing even before mvrf got implemented; it means that mvrf implementation is following the solution that already existed. If we need to make this "pre-down" change, we may need to get the approval from the original author of that code as well. What do you say?

w.r.t your suggestion about lo-m, is there an issue due to the order in which "master" is being set? With the existing order, what issue are you observing? Is the existing sequence creating any issue in the zebra? In principle, any user can execute these linux commands in any order and the applications need to handle them. If there are any issues with the existing sequence, we shall go ahead and change the sequence like you suggest. What do you say?

Regarding moving "lo-m" under "mgmt", do you see any issue in keeping it under "lo"?

NOTE: Now that the 201911 branch has been pulled out, let us try to do the changes that are mandatory for the release. If you think that any of the above changes are mandatory, request you to provide more reasons for the change which can be produced while raising PR.

@kannankvs

I think "pre-down" is the right thing no matter what zebra does. For kernel self, it will clean routes in handling interface down. So deleting routes make sense only if before handling interface down.
It conflicts with global lo before moving to mgmt. So vrf workgroup reached an agreement, setting vrf with no ip addresses. Now we can see many meanningless calculation in zebra to handle this too.
Now vrf mgmt is a domain independent from global, but if lo is shutdown how does lo-m work?

kannankvs commented 4 years ago

@tylerlinp :

I still think that both pre-down and down are right. In any system, if users bring down the interface before deleting the default route that uses that interface, the application should handle it. Anyway, instead of debating further, we will go ahead and change it to pre-down to avoid the zebra crash.
To avoid the calculations in zebra, we will go ahead and change the order.
Since sonic exepts lo to be UP all the time, lo-m is working fine with existing flow. Anyway, we will go ahead and place the lo-m under "mgmt" along with other changes. Overall, we will make the above changes, test them and raise PR. Request you to review and approve accordingly. I assume that we can keep this git issue open until zebra crash is fixed in FRR. Since the pre-down is avoiding the zebra crash, the priority can be reduced on this issue.

ryan44guo commented 4 years ago

@kannankvs logically, pre-down and down both should be right, but that's only logically. In my opinion, linux kernel shall send out DELROUTE to application which registered netlink "route", cause it had send out ADDROUTE for it. But they said that's their design to reduce the number of messages, so the applications must register "link" when they only want the "route". You can consider that frr should do it in "link" messages, but frr make some bugs about it. In fact, if we do the remove route stuff in "down" hook, it do nothing, cause linux kernel had done it. So we can consider the remove route stuff (in script) is for the DELROUTE messages, that is helpful for the applications which do not register "link" or not do the "link" messages well (as frr). So we must do it in pre-down hook, that is the only way to send out DELROUTE messages for IPV4 route. If we consider that is kernel's bug(although they do not accept), that is a workaround about it. But only the "pre-down" is, the "down" can do nothing.

kannankvs commented 4 years ago

@ryan44guo : Got it. As mentioned earlier, we will go ahead and make the changes. Just to understand it better, if we have route delete command in "down", is it not getting executed?

ryan44guo commented 4 years ago

@kannankvs : I am not very sure about the "down" hook call is before or after the down command. So I tested it, it had not send out DELROUTE messages. So I think the "down" hook is called after the "down" command, but before post-down. Of course, it is executed, but for these route stuff, the routes had been deleted by kernel, so no action really do in these nonexistence routes deletion actions.

kannankvs commented 4 years ago

@ryan44guo , @tylerlinp, @prsunny : I changed the interfaces.j2 and raised the PR3853. Request you to kindly review and approve. I am assuming that we shall keep this gitissue open until the FRR code is fixed. If you know who might be fixing it, request you to kindly tag them here.

wendani commented 4 years ago

$ vtysh

Hello, this is FRRouting (version 7.2-sonic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

# show ip route
zebra is not running

prsunny commented 4 years ago

Closing as #3853 is merged

prsunny commented 4 years ago

@wendani , is it a new issue? if so, can you raise an issue

pavel-shirshov commented 4 years ago

I'll make this open to track the proper fix of the issue.

rlhui commented 4 years ago

@prsunny will sync up with @pavel-shirshov

rlhui commented 4 years ago

was mentioned as seen in latest 201911

pavel-shirshov commented 4 years ago

I checked this. I found the issue with zebra on master. I see that zebra crashes when we issue /sbin/reboot. I asked @tahmed-dev to investigate further.

tahmed-dev commented 4 years ago

@kannankvs, is this issue still being seen? I just used the tip off master on my DUT and could not produces it by either method 1 mentioned in the issue details nor by issuing /sbin/reboot:

admin@str-s6000-acs-14:~$ sudo config vrf add mgmt
admin@str-s6000-acs-14:~$ sudo config vrf del mgmt
admin@str-s6000-acs-14:~$ ls -alrt /var/core
total 4
drwxr-xr-x 2 root root    3 Jun  9 16:23 .
drwxr-xr-x 1 root root 4096 Jun  9 23:58 ..

admin@str-s6000-acs-14:~$ show version

SONiC Software Version: SONiC.master.0-7525fea6
Distribution: Debian 10.4
Kernel: 4.19.0-6-amd64
Build commit: 7525fea6
Build date: Tue Jun  9 16:25:44 UTC 2020
Built by: taahme@taahme-vm0

sonic-net / sonic-buildimage

Zebra crash is observed when management VRF is disabled. Crash seen with simple linux scripts as well #3798

0 0x00007f5f7062bfff in raise () from /lib/x86_64-linux-gnu/libc.so.6

0 0x00007f5f7062bfff in raise () from /lib/x86_64-linux-gnu/libc.so.6

1 0x00007f5f7062d42a in abort () from /lib/x86_64-linux-gnu/libc.so.6

2 0x00007f5f7167939f in core_handler (signo=11, siginfo=0x7ffd2896ceb0,

3

4 srcdest_rnode_table (rn=) at ./lib/srcdest_table.h:90

5 rib_dest_table (dest=0x557be2369110) at zebra/rib.h:471

6 rib_dest_vrf (dest=0x557be2369110) at zebra/rib.h:479

7 netlink_route_info_fill (re=0x0, dest=0x557be2369110, cmd=25,

8 zfpm_netlink_encode_route (cmd=25, dest=dest@entry=0x557be2369110,

9 0x00007f5f6e5ad8ed in zfpm_encode_route (msg_type=,

10 zfpm_build_route_updates () at zebra/zebra_fpm.c:990

11 zfpm_build_updates () at zebra/zebra_fpm.c:1149

12 zfpm_write_cb (thread=0x7ffd289714c0) at zebra/zebra_fpm.c:1187

13 0x00007f5f71686a50 in thread_call (thread=thread@entry=0x7ffd289714c0)

14 0x00007f5f71656fa8 in frr_run (master=0x557be212c9c0) at lib/libfrr.c:1054

15 0x0000557be038c9b3 in main (argc=9, argv=0x7ffd289718c8)