sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

IPv6 routes not propagating from APPL_DB to ASIC_DB #5040

Open scottlaird opened 4 years ago

scottlaird commented 4 years ago

Description

Kernel IPv6 routes aren't consistently propagating from the APPL_DB to the ASIC_DB. This is seen on two different devices running recent(ish) Jenkins builds.

I'm using OSPFv6 to propagate IPv6 routes. They're appearing in the kernel just fine:

$ ip -6 route
...
2001:470:e959:f101::/64 via fe80::2e60:cff:fec1:4245 dev Ethernet0 proto ospf metric 20 pref medium
2001:470:e959:f102::/64 via fe80::2e60:cff:fe8b:5af7 dev Ethernet4 proto ospf metric 20 pref medium
2001:470:e959:f103::/64 via fe80::2e60:cff:fec1:4121 dev Ethernet8 proto ospf metric 20 pref medium
2001:470:e959:f104::/64 via fe80::56ab:3aff:fe24:3259 dev Ethernet12 proto ospf metric 20 pref medium
2001:470:e959:f105::/64 via fe80::2e60:cff:fec1:4115 dev Ethernet16 proto ospf metric 20 pref medium
2001:470:e959:f106::/64 via fe80::2e60:cff:fec1:4081 dev Ethernet20 proto ospf metric 20 pref medium
2001:470:e959:f107::/64 via fe80::f652:14ff:fe09:540 dev Ethernet24 proto ospf metric 20 pref medium
...

And they're in the APPL DB:

$ redis-dump -d 0 -y | egrep "ROUTE.*e959:f10"
  "ROUTE_TABLE:2001:470:e959:f101::/64": {
  "ROUTE_TABLE:2001:470:e959:f102::/64": {
  "ROUTE_TABLE:2001:470:e959:f103::/64": {
  "ROUTE_TABLE:2001:470:e959:f104::/64": {
  "ROUTE_TABLE:2001:470:e959:f105::/64": {
  "ROUTE_TABLE:2001:470:e959:f106::/64": {
  "ROUTE_TABLE:2001:470:e959:f107::/64": {

But they're not in the ASIC DB:

$ redis-dump -d 1 -y | egrep "ROUTE.*e959:f10"
$

There is nothing useful in /var/log/syslog or /var/log/swss

Looking deeper, there are only 4 IPv6 routes in the ASIC DB:

$ redis-dump -d 1 -y | egrep 'ROUTE.*"[0-9a-f:]+/[0-9]+'
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"2001:470:e959:f005::5/128\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"::/0\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"fe80::/10\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"fe80::464c:a8ff:fecb:7e2c/128\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {

For comparison, ip -6 addr show | wc gives 66 lines, although a few of those are ECMP routes.

Oddly, the default route in the ASIC DB (::/0) isn't actually correct, either. Here's what's in the ASIC DB:

  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"::/0\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
    "expireat": 1595657374.2020211, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "SAI_ROUTE_ENTRY_ATTR_PACKET_ACTION": "SAI_PACKET_ACTION_DROP"
    }
  }, 

That's listed as SAI_PACKET_ACTION_DROP. However, the kernel has a default route:

default via fe80::9a03:9bff:fe77:95e6 dev Ethernet116 proto ospf metric 20 pref medium

The APPL DB version matches the kernel:

  "ROUTE_TABLE:::/0": {
    "expireat": 1595657634.3779612, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "ifname": "Ethernet116", 
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
    }
  }, 

I'm not sure where the default drop is coming from.

Steps to reproduce the issue:

  1. Manually add a non-local IPv6 route. ip route add 2001:470:e959:ffff::/64 via fe80::ba6a:97ff:fe8a:7168 dev Ethernet120
  2. Watch it make it into the APPL DB: redis-dump -d 0 -y | grep ffff
  3. Watch it not make it to the ASIC DB: redis-dump -d 1 -y | grep ffff

Describe the results you received:

No route for 2001:470:e959:ffff::/64 in the ASIC DB.

Describe the results you expected:

One route for 2001:470:e959:ffff::/64 in the ASIC DB.

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

SONiC Software Version: SONiC.master.346-8ea03eed
Distribution: Debian 10.4
Kernel: 4.19.0-9-2-amd64
Build commit: 8ea03eed
Build date: Sun Jul 12 15:54:26 UTC 2020
Built by: johnar@jenkins-worker-4

Platform: x86_64-arista_7060_cx32s
HwSKU: Arista-7060CX-32S-C32
ASIC: broadcom
Serial Number: JPE16305312
Uptime: 06:06:15 up 2 days,  3:37,  1 user,  load average: 0.99, 0.79, 0.79

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-teamd                  latest                c7fc815e3d48        380MB
docker-teamd                  master.346-8ea03eed   c7fc815e3d48        380MB
docker-router-advertiser      latest                75cce46d08c1        350MB
docker-router-advertiser      master.346-8ea03eed   75cce46d08c1        350MB
docker-lldp                   latest                97cc65608d46        377MB
docker-lldp                   master.346-8ea03eed   97cc65608d46        377MB
docker-dhcp-relay             latest                7de199173c07        357MB
docker-dhcp-relay             master.346-8ea03eed   7de199173c07        357MB
docker-database               latest                fd5a608a6a1d        350MB
docker-database               master.346-8ea03eed   fd5a608a6a1d        350MB
docker-orchagent              latest                fd6344463e37        393MB
docker-orchagent              master.346-8ea03eed   fd6344463e37        393MB
docker-sonic-telemetry        latest                e80f9499208c        414MB
docker-sonic-telemetry        master.346-8ea03eed   e80f9499208c        414MB
docker-sonic-mgmt-framework   latest                264b3b468310        473MB
docker-sonic-mgmt-framework   master.346-8ea03eed   264b3b468310        473MB
docker-sflow                  latest                809ba25f3539        383MB
docker-sflow                  master.346-8ea03eed   809ba25f3539        383MB
docker-snmp                   latest                cf887db5be6a        390MB
docker-snmp                   master.346-8ea03eed   cf887db5be6a        390MB
docker-syncd-brcm             latest                7b2b853069dd        442MB
docker-syncd-brcm             master.346-8ea03eed   7b2b853069dd        442MB
docker-platform-monitor       latest                73f5850fe687        358MB
docker-platform-monitor       master.346-8ea03eed   73f5850fe687        358MB
docker-nat                    latest                a341ba9a6b96        317MB
docker-nat                    master.346-8ea03eed   a341ba9a6b96        317MB
docker-fpm-frr                latest                a10e5c731600        335MB
docker-fpm-frr                master.346-8ea03eed   a10e5c731600        335MB

Attach debug file sudo generate_dump:

sonic_dump_sw100_20200725_060652.tar.gz

scottlaird commented 4 years ago

Comments on my previous routes-not-propagating bug had questions about next-hop entries. I'm not sure how those are supposed to work with IPv6, but I don't see any v6 next hops in either DB.

$ redis-dump -d 0 -y
...
  "ROUTE_TABLE:2001:470:e959:eeee::/64": {
    "expireat": 1595699202.551358, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "ifname": "Ethernet116", 
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
    }
  }, 
...
$ redis-dump -d 0 -y | grep fe80::9a03:9bff:fe77:95e6
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
      "nexthop": "fe80::9a03:9bff:fe77:95e6"
$ redis-dump -d 1 -y | grep fe80::9a03:9bff:fe77:95e6
$

By comparison, my IPv4 route next-hops have NEIGH_TABLE entries in DB 0 and ASIC_STATE:SAI_OBJECT_TYPE_NEIGHBOR_ENTRY entries in DB 1. So perhaps this is an issue with fpmsyncd not generating neighbor entries for v6?

scottlaird commented 4 years ago

I turned swssloglevel for fpmsyncd up to DEBUG, and added a new route:

# ip route add 2001:470:e959:dddd::/64 via fe80::9a03:9bff:fe77:95e6 dev Ethernet116
# grep fpmsyncd /var/log/syslog
Jul 25 17:54:28.569127 sw100 DEBUG bgp#fpmsyncd: :> select: enter
Jul 25 17:55:04.534839 sw100 DEBUG bgp#fpmsyncd: :- onRouteMsg: Receive new route message dest ip prefix: 2001:470:e959:dddd::/64
Jul 25 17:55:04.534839 sw100 DEBUG bgp#fpmsyncd: :- onRouteMsg: RouteTable set msg: 2001:470:e959:dddd::/64 fe80::9a03:9bff:fe77:95e6 Ethernet116
Jul 25 17:55:04.534893 sw100 DEBUG bgp#fpmsyncd: :< select: exit
Jul 25 17:55:04.576124 sw100 DEBUG bgp#fpmsyncd: :- main: Pipeline flushed
Jul 25 17:55:04.576124 sw100 DEBUG bgp#fpmsyncd: :> select: enter

That's coming from https://github.com/Azure/sonic-swss/blob/master/fpmsyncd/routesync.cpp, and it looks okay. It has the right next hop and interface. The APPL DB shows the correct ROUTE_TABLE entry and no entry for the next hop.

I then turned up orchagent logging, and there are a bunch of these:

Jul 25 17:59:18.435766 sw100 INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::9a03:9bff:fe77:95e6@Ethernet116 for 2001:470:e959:dddd::/64

So, it looks like orchagent is more or less doing the right thing, and something upstream (fpmsyncd or neighsyncd?) is screwing up.

I ran redis -d 0 monitor and added yet another test route, and here's what showed up:

1595700332.809192 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "6875900592cdd1621c6191fe038ec3b29775aa13" "4" "ROUTE_TABLE_CHANNEL" "ROUTE_TABLE_KEY_SET" "_ROUTE_TABLE:2001:470:e959:cccc::/64" "_ROUTE_TABLE:2001:470:e959:cccc::/64" "G" "2001:470:e959:cccc::/64" "nexthop" "fe80::9a03:9bff:fe77:95e6" "ifname" "Ethernet116"
1595700332.809306 [0 lua] "SADD" "ROUTE_TABLE_KEY_SET" "2001:470:e959:cccc::/64"
1595700332.809384 [0 lua] "HSET" "_ROUTE_TABLE:2001:470:e959:cccc::/64" "nexthop" "fe80::9a03:9bff:fe77:95e6"
1595700332.809437 [0 lua] "HSET" "_ROUTE_TABLE:2001:470:e959:cccc::/64" "ifname" "Ethernet116"
1595700332.809478 [0 lua] "PUBLISH" "ROUTE_TABLE_CHANNEL" "G"
1595700332.809687 [2 unix:/var/run/redis/redis.sock] "HGETALL" "COUNTERS:oid:0x150000000004cd"
1595700332.809730 [0 unix:/var/run/redis/redis.sock] "EVALSHA" "88270a7c5c90583e56425aca8af8a4b8c39fe757" "3" "ROUTE_TABLE_KEY_SET" "ROUTE_TABLE:" "ROUTE_TABLE_DEL_SET" "8192" "_"
1595700332.809777 [0 lua] "SPOP" "ROUTE_TABLE_KEY_SET" "8192"
1595700332.809849 [0 lua] "SREM" "ROUTE_TABLE_DEL_SET" "2001:470:e959:cccc::/64"
1595700332.809871 [0 lua] "HGETALL" "_ROUTE_TABLE:2001:470:e959:cccc::/64"
1595700332.809897 [0 lua] "HSET" "ROUTE_TABLE:2001:470:e959:cccc::/64" "nexthop" "fe80::9a03:9bff:fe77:95e6"
1595700332.809944 [0 lua] "HSET" "ROUTE_TABLE:2001:470:e959:cccc::/64" "ifname" "Ethernet116"
1595700332.809985 [0 lua] "DEL" "_ROUTE_TABLE:2001:470:e959:cccc::/64"

I think that's okay; I don't see any code in https://github.com/Azure/sonic-swss-common/blob/master/common/producerstatetable.cpp or any of the Lua that goes with it that knows about NEIGH_TABLE, or cares about v4 vs v6. So that probably all falls to neighsyncd.

I suspect that the problem is here: https://github.com/Azure/sonic-swss/blob/a9479e646649e67d28d4afba395ab16c8907e7c7/neighsyncd/neighsync.cpp#L76, where it explicitly ignores v6 link-local neighbors. OSPFv3 explicitly uses link-local neighbors (as per the RFC). The pull request that added that line (Azure/sonic-swss#1065) mentions "some current limitations with handling link-local neighbors" but doesn't provide any details or an issue link.

Does anyone have context on this?

scottlaird commented 4 years ago

That makes this a duplicate of Azure/sonic-utilities#430, which is ~1.5 years old. Is there a plan for how best to approach this?

yxieca commented 4 years ago

This issue might not be OSPF specific.

Looks like your neighbors are IPv6 link local addresses. Are you able to confirm this is the issue only happen to link local neighbors?

Tison-Liu commented 3 years ago

I meeted the same problem, and I found that this is because sonic ignores the linklocal address when processing the neighbor table of the kernel. And, the neighbor of the linklocal address will not be issued. Therefore, the learned ipv6 route will be unable to find the nexthop and fail to config.

    nl_addr2str(rtnl_neigh_get_dst(neigh), ipStr, MAX_ADDR_SIZE);
    /* Ignore IPv6 link-local addresses as neighbors */
    if (family == IPV6_NAME && IN6_IS_ADDR_LINKLOCAL(nl_addr_get_binary_addr(rtnl_neigh_get_dst(neigh))))
        return;
    /* Ignore IPv6 multicast link-local addresses as neighbors */
    if (family == IPV6_NAME && IN6_IS_ADDR_MC_LINKLOCAL(nl_addr_get_binary_addr(rtnl_neigh_get_dst(neigh))))
        return;