sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

Routes not propagating from kernel to ASIC #4969

Open scottlaird opened 4 years ago

scottlaird commented 4 years ago

Description

I'm having problems with my Sonic switches failing to route traffic as expected.

Network layout (simplified):

A <=> B (Sonic) <=> C

All routes appear to be correct across A, B, and C. I'm able to ping A and C from B, and B from A and C. However, I cannot ping A from C or vice-versa. In the past, rebooting B has frequently fixed the problem, but new routes haven't consistently worked until rebooted.

I believe I'm hitting a bug in swss, where routes aren't being propagated from the kernel into the database correctly.

As an example, I'm currently having problems with a route to 10.0.0.101. The kernel knows about it:

sonic$ ip route show
...
10.0.0.101 via 10.0.2.2 dev Ethernet0 proto ospf metric 20 
...

However, I don't see a ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY for this in the DB:

sonic$ redis-dump -d 1 -y | fgrep TYPE_ROUTE_
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"0.0.0.0/0\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.0.0.5/32\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.0.2.0/30\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {

The only entry in DB 1 that matches 10.0.0.101 is a block inside of an ASIC_STATE_KEY_VALUE_OP_QUEUE entry:

sonic$ redis-dump -d 1 -y | fgrep 10.0.0.101
      "[\"{\\\"dest\\\":\\\"10.22.204.184/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.2.0.0/16\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.18.157.213/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.97.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.52.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.14.208.90/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.32.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.252/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.26.253.74/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.105/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.104/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.182.52.28/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.0.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.7/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.22.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.106/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.103/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000614\",\"{\\\"dest\\\":\\\"10.0.0.101/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000613\",\"{\\\"dest\\\":\\\"10.0.1.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.56/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.107/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.16/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.248/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.8.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x5000000000617\",\"{\\\"dest\\\":\\\"10.0.0.4/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.102/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000616\",\"{\\\"dest\\\":\\\"10.0.0.6/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.12/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.24/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.20/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\"]", 

A quick glance at the orchagent code didn't immediately make it clear exactly what was going on here, but it looks like it's put some work to be done into a queue, and then never actually processes the queue and adds routes correctly.

Steps to reproduce the issue:

  1. Boot switch
  2. Add a bunch of routes at once (using OSPF, by manually starting ospfd in the bgp container in this case, but I doubt it's OSPF specific)
  3. Watch routes stop being added to the DB

Describe the results you received:

I got a giant ASIC_STATE_KEY_VALUE_OP_QUEUE blob.

Describe the results you expected:

I expected to see ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY records.

Additional information you deem important (e.g. issue happens only occasionally):

I have two switches, an Arista 7060CX and an Edge-core AS7326. I've seen intermittent behavior similar to this back at least to build 310, but didn't look at redis until today. It happens very consistently with the 7060CX on build 346.

Output of show version:

SONiC Software Version: SONiC.master.346-8ea03eed
Distribution: Debian 10.4
Kernel: 4.19.0-9-2-amd64
Build commit: 8ea03eed
Build date: Sun Jul 12 15:54:26 UTC 2020
Built by: johnar@jenkins-worker-4

Platform: x86_64-arista_7060_cx32s
HwSKU: Arista-7060CX-32S-C32
ASIC: broadcom
Serial Number: JPE16305312
Uptime: 20:47:57 up 26 min,  1 user,  load average: 1.46, 1.37, 1.16

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-teamd                  latest                c7fc815e3d48        380MB
docker-teamd                  master.346-8ea03eed   c7fc815e3d48        380MB
docker-router-advertiser      latest                75cce46d08c1        350MB
docker-router-advertiser      master.346-8ea03eed   75cce46d08c1        350MB
docker-lldp                   latest                97cc65608d46        377MB
docker-lldp                   master.346-8ea03eed   97cc65608d46        377MB
docker-dhcp-relay             latest                7de199173c07        357MB
docker-dhcp-relay             master.346-8ea03eed   7de199173c07        357MB
docker-database               latest                fd5a608a6a1d        350MB
docker-database               master.346-8ea03eed   fd5a608a6a1d        350MB
docker-orchagent              latest                fd6344463e37        393MB
docker-orchagent              master.346-8ea03eed   fd6344463e37        393MB
docker-sonic-telemetry        latest                e80f9499208c        414MB
docker-sonic-telemetry        master.346-8ea03eed   e80f9499208c        414MB
docker-sonic-mgmt-framework   latest                264b3b468310        473MB
docker-sonic-mgmt-framework   master.346-8ea03eed   264b3b468310        473MB
docker-sflow                  latest                809ba25f3539        383MB
docker-sflow                  master.346-8ea03eed   809ba25f3539        383MB
docker-snmp                   latest                cf887db5be6a        390MB
docker-snmp                   master.346-8ea03eed   cf887db5be6a        390MB
docker-syncd-brcm             latest                7b2b853069dd        442MB
docker-syncd-brcm             master.346-8ea03eed   7b2b853069dd        442MB
docker-platform-monitor       latest                73f5850fe687        358MB
docker-platform-monitor       master.346-8ea03eed   73f5850fe687        358MB
docker-nat                    latest                a341ba9a6b96        317MB
docker-nat                    master.346-8ea03eed   a341ba9a6b96        317MB
docker-fpm-frr                latest                a10e5c731600        335MB
docker-fpm-frr                master.346-8ea03eed   a10e5c731600        335MB

Attach debug file sudo generate_dump:

sonic_dump_sw100_20200714_205053.tar.gz

scottlaird commented 4 years ago

This doesn't appear to be #4907; show interface status still works fine.

prsunny commented 4 years ago

For the sonic dump, just want to add my observations:

  1. zebra is crashed
  2. Neighbor entry for 10.0.2.2 doesnt appear to be added _NEIGH_TABLE:Ethernet0:10.0.2.2
  3. There is a timeout for orchagent. Not sure if syncd is crashed. Jul 14 20:50:02.264545 sw100 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
scottlaird commented 4 years ago

Interesting. Zebra crashed during a previous boot, but it seems to be running now:

$ ps aux | grep zebra
300       1800  0.0  0.1 783444 11464 pts/0    Sl   Jul14   0:21 /usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M fpm -M snmp

In any case, once a route makes it into the kernel, does Zebra running or not running actually impact orchagent?

WRT the Neighbor entry for 10.0.2.2, I see it in DB 0:

  "NEIGH_TABLE:Ethernet0:10.0.2.2": {
    "expireat": 1594848333.394223, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "family": "IPv4", 
      "neigh": "2c:60:0c:c1:42:45"
    }
  }, 

but it's not in DB 1, except in ASIC_STATE_KEY_VALUE_OP_QUEUE (as above):

      "Screate", 
      "[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.2\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005ed\"]", 
      "SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000613", 
      "Screate", 
      "[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:C1:42:45\"]", 
      "SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005ed\",\"switch_id\":\"oid:0x21000000000000\"}", 

Looking in the swss container's /var/log/swss/swss.rec log, I see multiple references to 10.0.2.2's NEIGH_TABLE entry:

2020-07-15.18:53:14.944344|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.19:15:16.374922|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.19:26:17.113755|NEIGH_TABLE:Ethernet0:10.0.2.2|SET|neigh:2c:60:0c:c1:42:45|family:IPv4
2020-07-15.19:26:17.113841|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.19:59:19.272168|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.19:59:19.272254|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.20:10:20.066900|NEIGH_TABLE:Ethernet0:10.0.2.2|SET|neigh:2c:60:0c:c1:42:45|family:IPv4
2020-07-15.20:21:20.743086|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.20:32:21.463003|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.20:43:22.260162|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.20:54:23.145367|NEIGH_TABLE:Ethernet0:10.0.2.2|SET|neigh:2c:60:0c:c1:42:45|family:IPv4
2020-07-15.20:54:23.145457|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.21:05:23.849869|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4

I assume that it's occurring because it's not actually making it into the ASIC DB?

There are a lot of orchagent entries in /var/log/syslog. There's a batch of these every 60s:

Jul 15 21:33:12.932150 sw100 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Jul 15 21:33:12.932150 sw100 ERR swss#orchagent: :- wait: failed to get response for getresponse
Jul 15 21:33:12.932593 sw100 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 54 , rv:-1

I installed strace onto the switch via .debs. Here's what orchagent is currently doing:

epoll_wait(217, [], 1, 60000)           = 0
sendto(10, "<11>Jul 15 21:32:12 orchagent: :"..., 87, MSG_NOSIGNAL, NULL, 0) = 87
sendto(10, "<11>Jul 15 21:32:12 orchagent: :"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
close(217)                              = 0
sendto(10, "<11>Jul 15 21:32:12 orchagent: :"..., 100, MSG_NOSIGNAL, NULL, 0) = 100
write(16, "*9\r\n$7\r\nEVALSHA\r\n$40\r\nd171e04fd7"..., 268) = 268
read(16, "$-1\r\n", 16384)              = 5
epoll_create1(0)                        = 217
epoll_ctl(217, EPOLL_CTL_ADD, 17, {EPOLLIN, {u32=17, u64=17}}) = 0
epoll_wait(217, [], 1, 0)               = 0
epoll_wait(217, [], 1, 60000)           = 0
sendto(10, "<11>Jul 15 21:33:12 orchagent: :"..., 87, MSG_NOSIGNAL, NULL, 0) = 87
sendto(10, "<11>Jul 15 21:33:12 orchagent: :"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
close(217)                              = 0
sendto(10, "<11>Jul 15 21:33:12 orchagent: :"..., 100, MSG_NOSIGNAL, NULL, 0) = 100
write(16, "*9\r\n$7\r\nEVALSHA\r\n$40\r\nd171e04fd7"..., 268) = 268
read(16, "$-1\r\n", 16384)              = 5
epoll_create1(0)                        = 217
epoll_ctl(217, EPOLL_CTL_ADD, 17, {EPOLLIN, {u32=17, u64=17}}) = 0
epoll_wait(217, [], 1, 0)               = 0

FD 10 is obviously syslog. FD 16 is probably /var/run/redis/redis.sock, but I don't see a filename in /proc/net/unix. So I wouldn't say that it's crashed, exactly, but it doesn't appear to be making any progress on anything. I ran redis-cli -d 1 monitor | grep EVALSHA and it looks like this is the query that orchagent sent:

1594849574.335416 [1 unix:/var/run/redis/redis.sock] "EVALSHA" "d171e04fd79e95ca2287f3b067c46ae76a82208b" "2" "ASIC_STATE_KEY_VALUE_OP_QUEUE" "ASIC_STATE_CHANNEL" "SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000" "[\"SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_ENTRY\",\"1623573616\"]" "Sget" "G"

Note that this happened after the call to syslog.

yxieca commented 4 years ago

@scottlaird : zebra is responsible for push routes via netlink msgs to appl_db. If zebra is not running, then the routes won't make to appl_db. Please make sure when you test. zebra is healthy.

venkatmahalingam commented 4 years ago

I see the route entry 10.0.0.101 in APPL_DB and also as per the latest comment, ASIC_DB has the NEIGHBOR entry, @scottlaird Can you check why the neighbor is not programmed to ASIC from Syncd.

"ROUTE_TABLE:10.0.0.101": { "expireat": 1594759892.050941, "ttl": -0.001, "type": "hash", "value": { "ifname": "Ethernet0", "nexthop": "10.0.2.2" } },

"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005ed\",\"switch_id\":\"oid:0x21000000000000\"}",

scottlaird commented 4 years ago

@scottlaird : zebra is responsible for push routes via netlink msgs to appl_db. If zebra is not running, then the routes won't make to appl_db. Please make sure when you test. zebra is healthy.

I assume this happens in fpmsyncd?

Ingestion into appl_db looks like it's working. I manually added a new route to 10.99.99.98 that mirrored the missing 10.0.0.101 route, and it appeared in appl_db more or less immediately:

# ip route add 10.99.99.98 via 10.0.2.2
# redis-dump -d 0 -y 
...
  "_ROUTE_TABLE:10.99.99.98": {
    "expireat": 1595464061.8237002, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "ifname": "Ethernet0", 
      "nexthop": "10.0.2.2"
    }
  }
}

Except, this is a _ROUTE_TABLE entry, not a ROUTE_TABLE entry, with a leading underscore. Neither my C++ nor my Redis is great, but I think that's coming from getStateHashPrefix(), and should be removed in producer_state_table_apply_view.lua. I spent way to much time trying to debug this, only to discover that it eventually did the right thing on its own, after more than 5 minutes:

# redis-dump -d 0 -y 
...
  "ROUTE_TABLE:10.99.99.98": {
    "expireat": 1595466464.552942, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "ifname": "Ethernet0", 
      "nexthop": "10.0.2.2"
    }
  }, 
...

However, no SAI_OBJECT_TYPE_ROUTE_ENTRY for 10.99.99.98 exists in the ASIC DB. There is an entry under ASIC_STATE_KEY_VALUE_OP_QUEUE, though.

scottlaird commented 4 years ago

I see the route entry 10.0.0.101 in APPL_DB and also as per the latest comment, ASIC_DB has the NEIGHBOR entry, @scottlaird Can you check why the neighbor is not programmed to ASIC from Syncd.

"ROUTE_TABLE:10.0.0.101": { "expireat": 1594759892.050941, "ttl": -0.001, "type": "hash", "value": { "ifname": "Ethernet0", "nexthop": "10.0.2.2" } },

"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"10.0.2.2","rif":"oid:0x60000000005ed","switch_id":"oid:0x21000000000000"}",

I don't see a usable neighbor entry for 10.0.2.2 in the ASIC DB (1), only in the APPL DB (0):

$ redis-dump -d 0 -y | egrep -i 'neigh.*10.0.2.2'
  "NEIGH_TABLE:Ethernet0:10.0.2.2": {
$ redis-dump -d 1 -y | egrep -i 'neigh.*10.0.2.2'
      "SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005ed\",\"switch_id\":\"oid:0x21000000000000\"}", 

The entry from the ASIC DB is from ASIC_STATE_KEY_VALUE_OP_QUEUE.

From looking at my other switch, I think I should be seeing entries more like this:

  "ASIC_STATE:SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.1.2\",\"rif\":\"oid:0x60000000009df\",\"switch_id\":\"oid:0x21000000000000\"}": {
    "type": "hash", 
    "value": {
      "SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS": "54:AB:3A:BC:84:A1"
    }
  }, 

I have 0 of those on this switch:

$ redis-dump -d 1 -y | grep ASIC_STATE:SAI_OBJECT_TYPE_NEIGHBOR_ENTRY
$

I decided to try rebooting (again) to see if that helps at all. I raised the swssloglevel via swssloglevel -l DEBUG -c orchagent.

After rebooting the switch, ASIC_STATE_KEY_VALUE_OP_QUEUE is smaller (but still non-empty):

  "ASIC_STATE_KEY_VALUE_OP_QUEUE": {
    "expireat": 1595469340.9739242, 
    "ttl": -0.001, 
    "type": "list", 
    "value": [
      "Sbulkset", 
      "[\"{\\\"dest\\\":\\\"10.0.2.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x600000000060b\",\"{\\\"dest\\\":\\\"10.0.2.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005e9\",\"{\\\"dest\\\":\\\"10.0.2.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005fa\",\"{\\\"dest\\\":\\\"10.0.2.116/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005ee\"]", 
      "SAI_OBJECT_TYPE_ROUTE_ENTRY:4"
    ]
  }, 

I then fired up ospfd and ospf6d, and the queue grew a bit:

  "ASIC_STATE_KEY_VALUE_OP_QUEUE": {
    "expireat": 1595469447.839086, 
    "ttl": -0.001, 
    "type": "list", 
    "value": [
      "Sbulkset", 
      "[\"{\\\"dest\\\":\\\"0.0.0.0/0\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"0.0.0.0/0\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_PACKET_ACTION=SAI_PACKET_ACTION_FORWARD\"]", 
      "SAI_OBJECT_TYPE_ROUTE_ENTRY:2", 
      "Sbulkcreate", 
      "[\"{\\\"dest\\\":\\\"50.125.243.136/29\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\"]", 
      "SAI_OBJECT_TYPE_ROUTE_ENTRY:1", 
      "Sbulkcreate", 
      "[\"{\\\"dest\\\":\\\"10.22.204.184/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.2.0.0/16\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.18.157.213/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.97.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.52.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.14.208.90/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.32.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.11.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.252/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.26.253.74/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.105/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.104/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.182.52.28/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.0.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.7/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.22.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.106/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.103/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000616\",\"{\\\"dest\\\":\\\"10.0.0.101/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000614\",\"{\\\"dest\\\":\\\"10.0.1.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.56/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.107/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.16/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.248/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.8.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x5000000000617\",\"{\\\"dest\\\":\\\"10.0.0.4/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.102/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000613\",\"{\\\"dest\\\":\\\"10.0.0.6/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.12/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.24/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.20/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\"]", 
      "SAI_OBJECT_TYPE_ROUTE_ENTRY:33", 
      "Sbulkcreate", 
      "[\"oid:0x2d000000000618\",\"SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_GROUP_ID=oid:0x5000000000617|SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_ID=oid:0x4000000000614\",\"oid:0x2d000000000619\",\"SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_GROUP_ID=oid:0x5000000000617|SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_ID=oid:0x4000000000613\",\"oid:0x2d00000000061a\",\"SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_GROUP_ID=oid:0x5000000000617|SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_ID=oid:0x4000000000616\"]", 
      "SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER:3", 
      "Screate", 
      "[\"SAI_NEXT_HOP_GROUP_ATTR_TYPE\",\"SAI_NEXT_HOP_GROUP_TYPE_DYNAMIC_UNORDERED_ECMP\"]", 
      "SAI_OBJECT_TYPE_NEXT_HOP_GROUP:oid:0x5000000000617", 
      "Sbulkset", 
      "[\"{\\\"dest\\\":\\\"10.0.2.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x600000000060b\",\"{\\\"dest\\\":\\\"10.0.2.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005fa\",\"{\\\"dest\\\":\\\"10.0.2.116/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005ee\"]", 
      "SAI_OBJECT_TYPE_ROUTE_ENTRY:3", 
      "Screate", 
      "[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.10\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x600000000060b\"]", 
      "SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000616", 
      "Screate", 
      "[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:C1:41:21\"]", 
      "SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.10\",\"rif\":\"oid:0x600000000060b\",\"switch_id\":\"oid:0x21000000000000\"}", 
      "Screate", 
      "[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.118\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005ee\"]", 
      "SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000615", 
      "Screate", 
      "[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"98:03:9B:77:95:E6\"]", 
      "SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.118\",\"rif\":\"oid:0x60000000005ee\",\"switch_id\":\"oid:0x21000000000000\"}", 
      "Screate", 
      "[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.2\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005e9\"]", 
      "SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000614", 
      "Screate", 
      "[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:C1:42:45\"]", 
      "SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005e9\",\"switch_id\":\"oid:0x21000000000000\"}", 
      "Sbulkset", 
      "[\"{\\\"dest\\\":\\\"10.0.2.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005e9\"]", 
      "SAI_OBJECT_TYPE_ROUTE_ENTRY:1", 
      "Screate", 
      "[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.6\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005fa\"]", 
      "SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000613", 
      "Screate", 
      "[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:8B:5A:F7\"]", 
      "SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.6\",\"rif\":\"oid:0x60000000005fa\",\"switch_id\":\"oid:0x21000000000000\"}", 
      "Sbulkset", 
      "[\"{\\\"dest\\\":\\\"10.0.2.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x600000000060b\",\"{\\\"dest\\\":\\\"10.0.2.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005e9\",\"{\\\"dest\\\":\\\"10.0.2.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005fa\",\"{\\\"dest\\\":\\\"10.0.2.116/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005ee\"]", 
      "SAI_OBJECT_TYPE_ROUTE_ENTRY:4"
    ]
  }, 

Oddly, the routes to 10.0.2.8/30, 10.0.2.4/30, etc from the initial queue blob are all (a) still there in the queue and (b) exist in the ASIC DB:

...
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.0.2.0/30\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
    "expireat": 1595469823.8407772, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID": "oid:0x60000000005e9"
    }
  }, 
  "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.0.2.4/30\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
    "expireat": 1595469823.859452, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID": "oid:0x60000000005fa"
    }
  }, 
...
Jul 23 02:03:40.814103 sw100 DEBUG swss#orchagent: :- get: generic get key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, fields: 1
Jul 23 02:03:40.814103 sw100 DEBUG swss#orchagent: :> canSkipRecording: enter
Jul 23 02:03:40.814129 sw100 DEBUG swss#orchagent: :< canSkipRecording: exit
Jul 23 02:03:40.814129 sw100 DEBUG swss#orchagent: :> set: enter
Jul 23 02:03:40.814141 sw100 DEBUG swss#orchagent: :< set: exit
Jul 23 02:03:40.814141 sw100 DEBUG swss#orchagent: :> waitForGetResponse: enter
Jul 23 02:03:40.814159 sw100 DEBUG swss#orchagent: :> wait: enter
Jul 23 02:03:40.814159 sw100 INFO swss#orchagent: :- wait: wait for getresponse response
Jul 23 02:03:40.814175 sw100 DEBUG swss#orchagent: :> select: enter
Jul 23 02:04:40.864367 sw100 DEBUG swss#orchagent: :< select: exit
Jul 23 02:04:40.864367 sw100 DEBUG swss#orchagent: :> getSelectResultAsString: enter
Jul 23 02:04:40.864367 sw100 DEBUG swss#orchagent: :< getSelectResultAsString: exit

That OID seems to exist:

  "ASIC_STATE:SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000": {
    "expireat": 1595470146.3998601, 
    "ttl": -0.001, 
    "type": "hash", 
    "value": {
      "NULL": "NULL", 
      "SAI_SWITCH_ATTR_ECMP_DEFAULT_HASH_SEED": "10", 
      "SAI_SWITCH_ATTR_FDB_AGING_TIME": "600", 
      "SAI_SWITCH_ATTR_FDB_EVENT_NOTIFY": "0x5622d9986350", 
      "SAI_SWITCH_ATTR_INIT_SWITCH": "true", 
      "SAI_SWITCH_ATTR_LAG_DEFAULT_HASH_SEED": "10", 
      "SAI_SWITCH_ATTR_PORT_STATE_CHANGE_NOTIFY": "0x5622d9986360", 
      "SAI_SWITCH_ATTR_SRC_MAC_ADDRESS": "44:4C:A8:CB:7E:2C", 
      "SAI_SWITCH_ATTR_SWITCH_SHUTDOWN_REQUEST_NOTIFY": "0x5622d9986370"
    }
  }, 

Here's a larger snippet from the orchagent logs:

log.txt

scottlaird commented 4 years ago

Oh for crying out loud.

Fixed.

The root filesystem was full, largely with 2 Sonic images. Once I removed the older of the two and rebooted everything seems to have popped up fine. I see neighbor entries, next hops, and a route to 10.0.0.101.

Apparently I need a bigger boot flash device for my 7060CX.

scottlaird commented 4 years ago

I rebooted the switch, and routes stopped propagating again. I'm still seeing "wait for getresponse response", and a bunch of queued routes. So it's not disk space. Reopening.

bluecmd commented 3 years ago

I might be hitting this issue, or something similar - the symptom is that while Linux has the kernel routes installed, there are no SAI objects. I am using BGP for what it is worth.

(vrf:mgmt)bluecmd@adele:~$ ip -6 route show vrf VrfMainframe | grep 2001:678:f40
2001:678:f40::/48 via fe80::209:fff:fe09:d401 dev Ethernet93 proto bgp metric 20 pref medium
(vrf:mgmt)bluecmd@adele:~$ show ipv6 route vrf VrfMainframe | grep 2001:678:f40
B>* 2001:678:f40::/48 [20/0] via fe80::209:fff:fe09:d401, tenGigE1/24/2, weight 1, 03:35:25
(vrf:mgmt)bluecmd@adele:~$ redis-cli -n 0 'keys' '*' | grep ROUTE_TABLE | grep f40
ROUTE_TABLE:VrfMainframe:2001:678:f40::/48
ROUTE_TABLE:VrfKamel:2001:678:f40::/48
(vrf:mgmt)bluecmd@adele:~$ redis-cli -n 1 'keys' '*ROUTE_ENTRY*' | grep 2001:678:f40
(vrf:mgmt)bluecmd@adele:~$ bcmcmd -v 'l3 ip6route show' | grep 2001:
(vrf:mgmt)bluecmd@adele:~$ 

If I manually insert the L3 route using l3 ip6route add VRF=3 IP=2001:678:f40:: masklen=48 mac=00:09:0f:09:d4:01 intf=100007 things start working as expected.

EDIT: Enabling INFO logging (DEBUG didn't do much) on orchagent reveals these error messages:

Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::209:fff:fe09:d401@Ethernet92 for 2001:678:f40::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::250:56ff:fe97:f36d@Ethernet92 for 2a0d:1a45:beef::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::74c0:f8ff:fe4b:c85d@Ethernet92 for 2a10:11c0::/32, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::209:fff:fe09:d401@Ethernet93 for 2001:678:f40::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::250:56ff:fe97:f36d@Ethernet93 for 2a0d:1a45:beef::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::209:fff:fe09:bc07@Vlan511 for 2a10:11c0::/32, resolving neighbor

EDIT 2: I filed https://github.com/Azure/sonic-swss/issues/1872