Open scottlaird opened 4 years ago
This doesn't appear to be #4907; show interface status
still works fine.
For the sonic dump, just want to add my observations:
_NEIGH_TABLE:Ethernet0:10.0.2.2
Jul 14 20:50:02.264545 sw100 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Interesting. Zebra crashed during a previous boot, but it seems to be running now:
$ ps aux | grep zebra
300 1800 0.0 0.1 783444 11464 pts/0 Sl Jul14 0:21 /usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M fpm -M snmp
In any case, once a route makes it into the kernel, does Zebra running or not running actually impact orchagent?
WRT the Neighbor entry for 10.0.2.2, I see it in DB 0:
"NEIGH_TABLE:Ethernet0:10.0.2.2": {
"expireat": 1594848333.394223,
"ttl": -0.001,
"type": "hash",
"value": {
"family": "IPv4",
"neigh": "2c:60:0c:c1:42:45"
}
},
but it's not in DB 1, except in ASIC_STATE_KEY_VALUE_OP_QUEUE (as above):
"Screate",
"[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.2\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005ed\"]",
"SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000613",
"Screate",
"[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:C1:42:45\"]",
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005ed\",\"switch_id\":\"oid:0x21000000000000\"}",
Looking in the swss container's /var/log/swss/swss.rec
log, I see multiple references to 10.0.2.2's NEIGH_TABLE entry:
2020-07-15.18:53:14.944344|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.19:15:16.374922|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.19:26:17.113755|NEIGH_TABLE:Ethernet0:10.0.2.2|SET|neigh:2c:60:0c:c1:42:45|family:IPv4
2020-07-15.19:26:17.113841|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.19:59:19.272168|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.19:59:19.272254|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.20:10:20.066900|NEIGH_TABLE:Ethernet0:10.0.2.2|SET|neigh:2c:60:0c:c1:42:45|family:IPv4
2020-07-15.20:21:20.743086|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.20:32:21.463003|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
2020-07-15.20:43:22.260162|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.20:54:23.145367|NEIGH_TABLE:Ethernet0:10.0.2.2|SET|neigh:2c:60:0c:c1:42:45|family:IPv4
2020-07-15.20:54:23.145457|NEIGH_TABLE:Ethernet4:10.0.2.6|SET|neigh:2c:60:0c:8b:5a:f7|family:IPv4
2020-07-15.21:05:23.849869|NEIGH_TABLE:eth0:10.1.8.254|SET|neigh:ac:1f:6b:c2:88:a0|family:IPv4
I assume that it's occurring because it's not actually making it into the ASIC DB?
There are a lot of orchagent entries in /var/log/syslog
. There's a batch of these every 60s:
Jul 15 21:33:12.932150 sw100 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Jul 15 21:33:12.932150 sw100 ERR swss#orchagent: :- wait: failed to get response for getresponse
Jul 15 21:33:12.932593 sw100 ERR swss#orchagent: :- getResAvailableCounters: Failed to get switch attribute 54 , rv:-1
I installed strace onto the switch via .debs. Here's what orchagent is currently doing:
epoll_wait(217, [], 1, 60000) = 0
sendto(10, "<11>Jul 15 21:32:12 orchagent: :"..., 87, MSG_NOSIGNAL, NULL, 0) = 87
sendto(10, "<11>Jul 15 21:32:12 orchagent: :"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
close(217) = 0
sendto(10, "<11>Jul 15 21:32:12 orchagent: :"..., 100, MSG_NOSIGNAL, NULL, 0) = 100
write(16, "*9\r\n$7\r\nEVALSHA\r\n$40\r\nd171e04fd7"..., 268) = 268
read(16, "$-1\r\n", 16384) = 5
epoll_create1(0) = 217
epoll_ctl(217, EPOLL_CTL_ADD, 17, {EPOLLIN, {u32=17, u64=17}}) = 0
epoll_wait(217, [], 1, 0) = 0
epoll_wait(217, [], 1, 60000) = 0
sendto(10, "<11>Jul 15 21:33:12 orchagent: :"..., 87, MSG_NOSIGNAL, NULL, 0) = 87
sendto(10, "<11>Jul 15 21:33:12 orchagent: :"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
close(217) = 0
sendto(10, "<11>Jul 15 21:33:12 orchagent: :"..., 100, MSG_NOSIGNAL, NULL, 0) = 100
write(16, "*9\r\n$7\r\nEVALSHA\r\n$40\r\nd171e04fd7"..., 268) = 268
read(16, "$-1\r\n", 16384) = 5
epoll_create1(0) = 217
epoll_ctl(217, EPOLL_CTL_ADD, 17, {EPOLLIN, {u32=17, u64=17}}) = 0
epoll_wait(217, [], 1, 0) = 0
FD 10 is obviously syslog. FD 16 is probably /var/run/redis/redis.sock, but I don't see a filename in /proc/net/unix. So I wouldn't say that it's crashed, exactly, but it doesn't appear to be making any progress on anything. I ran redis-cli -d 1 monitor | grep EVALSHA
and it looks like this is the query that orchagent sent:
1594849574.335416 [1 unix:/var/run/redis/redis.sock] "EVALSHA" "d171e04fd79e95ca2287f3b067c46ae76a82208b" "2" "ASIC_STATE_KEY_VALUE_OP_QUEUE" "ASIC_STATE_CHANNEL" "SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000" "[\"SAI_SWITCH_ATTR_AVAILABLE_NEXT_HOP_GROUP_ENTRY\",\"1623573616\"]" "Sget" "G"
Note that this happened after the call to syslog.
@scottlaird : zebra is responsible for push routes via netlink msgs to appl_db. If zebra is not running, then the routes won't make to appl_db. Please make sure when you test. zebra is healthy.
I see the route entry 10.0.0.101 in APPL_DB and also as per the latest comment, ASIC_DB has the NEIGHBOR entry, @scottlaird Can you check why the neighbor is not programmed to ASIC from Syncd.
"ROUTE_TABLE:10.0.0.101": { "expireat": 1594759892.050941, "ttl": -0.001, "type": "hash", "value": { "ifname": "Ethernet0", "nexthop": "10.0.2.2" } },
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005ed\",\"switch_id\":\"oid:0x21000000000000\"}",
@scottlaird : zebra is responsible for push routes via netlink msgs to appl_db. If zebra is not running, then the routes won't make to appl_db. Please make sure when you test. zebra is healthy.
I assume this happens in fpmsyncd
?
Ingestion into appl_db looks like it's working. I manually added a new route to 10.99.99.98 that mirrored the missing 10.0.0.101 route, and it appeared in appl_db more or less immediately:
# ip route add 10.99.99.98 via 10.0.2.2
# redis-dump -d 0 -y
...
"_ROUTE_TABLE:10.99.99.98": {
"expireat": 1595464061.8237002,
"ttl": -0.001,
"type": "hash",
"value": {
"ifname": "Ethernet0",
"nexthop": "10.0.2.2"
}
}
}
Except, this is a _ROUTE_TABLE
entry, not a ROUTE_TABLE
entry, with a leading underscore. Neither my C++ nor my Redis is great, but I think that's coming from getStateHashPrefix(), and should be removed in producer_state_table_apply_view.lua. I spent way to much time trying to debug this, only to discover that it eventually did the right thing on its own, after more than 5 minutes:
# redis-dump -d 0 -y
...
"ROUTE_TABLE:10.99.99.98": {
"expireat": 1595466464.552942,
"ttl": -0.001,
"type": "hash",
"value": {
"ifname": "Ethernet0",
"nexthop": "10.0.2.2"
}
},
...
However, no SAI_OBJECT_TYPE_ROUTE_ENTRY for 10.99.99.98 exists in the ASIC DB. There is an entry under ASIC_STATE_KEY_VALUE_OP_QUEUE, though.
I see the route entry 10.0.0.101 in APPL_DB and also as per the latest comment, ASIC_DB has the NEIGHBOR entry, @scottlaird Can you check why the neighbor is not programmed to ASIC from Syncd.
"ROUTE_TABLE:10.0.0.101": { "expireat": 1594759892.050941, "ttl": -0.001, "type": "hash", "value": { "ifname": "Ethernet0", "nexthop": "10.0.2.2" } },
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{"ip":"10.0.2.2","rif":"oid:0x60000000005ed","switch_id":"oid:0x21000000000000"}",
I don't see a usable neighbor entry for 10.0.2.2 in the ASIC DB (1), only in the APPL DB (0):
$ redis-dump -d 0 -y | egrep -i 'neigh.*10.0.2.2'
"NEIGH_TABLE:Ethernet0:10.0.2.2": {
$ redis-dump -d 1 -y | egrep -i 'neigh.*10.0.2.2'
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005ed\",\"switch_id\":\"oid:0x21000000000000\"}",
The entry from the ASIC DB is from ASIC_STATE_KEY_VALUE_OP_QUEUE.
From looking at my other switch, I think I should be seeing entries more like this:
"ASIC_STATE:SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.1.2\",\"rif\":\"oid:0x60000000009df\",\"switch_id\":\"oid:0x21000000000000\"}": {
"type": "hash",
"value": {
"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS": "54:AB:3A:BC:84:A1"
}
},
I have 0 of those on this switch:
$ redis-dump -d 1 -y | grep ASIC_STATE:SAI_OBJECT_TYPE_NEIGHBOR_ENTRY
$
I decided to try rebooting (again) to see if that helps at all. I raised the swssloglevel via swssloglevel -l DEBUG -c orchagent
.
After rebooting the switch, ASIC_STATE_KEY_VALUE_OP_QUEUE is smaller (but still non-empty):
"ASIC_STATE_KEY_VALUE_OP_QUEUE": {
"expireat": 1595469340.9739242,
"ttl": -0.001,
"type": "list",
"value": [
"Sbulkset",
"[\"{\\\"dest\\\":\\\"10.0.2.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x600000000060b\",\"{\\\"dest\\\":\\\"10.0.2.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005e9\",\"{\\\"dest\\\":\\\"10.0.2.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005fa\",\"{\\\"dest\\\":\\\"10.0.2.116/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005ee\"]",
"SAI_OBJECT_TYPE_ROUTE_ENTRY:4"
]
},
I then fired up ospfd
and ospf6d
, and the queue grew a bit:
"ASIC_STATE_KEY_VALUE_OP_QUEUE": {
"expireat": 1595469447.839086,
"ttl": -0.001,
"type": "list",
"value": [
"Sbulkset",
"[\"{\\\"dest\\\":\\\"0.0.0.0/0\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"0.0.0.0/0\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_PACKET_ACTION=SAI_PACKET_ACTION_FORWARD\"]",
"SAI_OBJECT_TYPE_ROUTE_ENTRY:2",
"Sbulkcreate",
"[\"{\\\"dest\\\":\\\"50.125.243.136/29\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\"]",
"SAI_OBJECT_TYPE_ROUTE_ENTRY:1",
"Sbulkcreate",
"[\"{\\\"dest\\\":\\\"10.22.204.184/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.2.0.0/16\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.18.157.213/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.97.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.52.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.14.208.90/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.32.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.11.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.252/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.26.253.74/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.105/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.104/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.182.52.28/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.0.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.7/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.22.0/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.106/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.103/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000616\",\"{\\\"dest\\\":\\\"10.0.0.101/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000614\",\"{\\\"dest\\\":\\\"10.0.1.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.56/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.107/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.16/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.248/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.1.8.0/24\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x5000000000617\",\"{\\\"dest\\\":\\\"10.0.0.4/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.0.102/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000613\",\"{\\\"dest\\\":\\\"10.0.0.6/32\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.12/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.24/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\",\"{\\\"dest\\\":\\\"10.0.1.20/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000615\"]",
"SAI_OBJECT_TYPE_ROUTE_ENTRY:33",
"Sbulkcreate",
"[\"oid:0x2d000000000618\",\"SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_GROUP_ID=oid:0x5000000000617|SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_ID=oid:0x4000000000614\",\"oid:0x2d000000000619\",\"SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_GROUP_ID=oid:0x5000000000617|SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_ID=oid:0x4000000000613\",\"oid:0x2d00000000061a\",\"SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_GROUP_ID=oid:0x5000000000617|SAI_NEXT_HOP_GROUP_MEMBER_ATTR_NEXT_HOP_ID=oid:0x4000000000616\"]",
"SAI_OBJECT_TYPE_NEXT_HOP_GROUP_MEMBER:3",
"Screate",
"[\"SAI_NEXT_HOP_GROUP_ATTR_TYPE\",\"SAI_NEXT_HOP_GROUP_TYPE_DYNAMIC_UNORDERED_ECMP\"]",
"SAI_OBJECT_TYPE_NEXT_HOP_GROUP:oid:0x5000000000617",
"Sbulkset",
"[\"{\\\"dest\\\":\\\"10.0.2.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x600000000060b\",\"{\\\"dest\\\":\\\"10.0.2.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005fa\",\"{\\\"dest\\\":\\\"10.0.2.116/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005ee\"]",
"SAI_OBJECT_TYPE_ROUTE_ENTRY:3",
"Screate",
"[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.10\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x600000000060b\"]",
"SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000616",
"Screate",
"[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:C1:41:21\"]",
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.10\",\"rif\":\"oid:0x600000000060b\",\"switch_id\":\"oid:0x21000000000000\"}",
"Screate",
"[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.118\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005ee\"]",
"SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000615",
"Screate",
"[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"98:03:9B:77:95:E6\"]",
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.118\",\"rif\":\"oid:0x60000000005ee\",\"switch_id\":\"oid:0x21000000000000\"}",
"Screate",
"[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.2\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005e9\"]",
"SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000614",
"Screate",
"[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:C1:42:45\"]",
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.2\",\"rif\":\"oid:0x60000000005e9\",\"switch_id\":\"oid:0x21000000000000\"}",
"Sbulkset",
"[\"{\\\"dest\\\":\\\"10.0.2.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005e9\"]",
"SAI_OBJECT_TYPE_ROUTE_ENTRY:1",
"Screate",
"[\"SAI_NEXT_HOP_ATTR_TYPE\",\"SAI_NEXT_HOP_TYPE_IP\",\"SAI_NEXT_HOP_ATTR_IP\",\"10.0.2.6\",\"SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID\",\"oid:0x60000000005fa\"]",
"SAI_OBJECT_TYPE_NEXT_HOP:oid:0x4000000000613",
"Screate",
"[\"SAI_NEIGHBOR_ENTRY_ATTR_DST_MAC_ADDRESS\",\"2C:60:0C:8B:5A:F7\"]",
"SAI_OBJECT_TYPE_NEIGHBOR_ENTRY:{\"ip\":\"10.0.2.6\",\"rif\":\"oid:0x60000000005fa\",\"switch_id\":\"oid:0x21000000000000\"}",
"Sbulkset",
"[\"{\\\"dest\\\":\\\"10.0.2.8/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x600000000060b\",\"{\\\"dest\\\":\\\"10.0.2.0/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005e9\",\"{\\\"dest\\\":\\\"10.0.2.4/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005fa\",\"{\\\"dest\\\":\\\"10.0.2.116/30\\\",\\\"switch_id\\\":\\\"oid:0x21000000000000\\\",\\\"vr\\\":\\\"oid:0x3000000000022\\\"}\",\"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x60000000005ee\"]",
"SAI_OBJECT_TYPE_ROUTE_ENTRY:4"
]
},
Oddly, the routes to 10.0.2.8/30, 10.0.2.4/30, etc from the initial queue blob are all (a) still there in the queue and (b) exist in the ASIC DB:
...
"ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.0.2.0/30\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
"expireat": 1595469823.8407772,
"ttl": -0.001,
"type": "hash",
"value": {
"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID": "oid:0x60000000005e9"
}
},
"ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.0.2.4/30\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x3000000000022\"}": {
"expireat": 1595469823.859452,
"ttl": -0.001,
"type": "hash",
"value": {
"SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID": "oid:0x60000000005fa"
}
},
...
Jul 23 02:03:40.814103 sw100 DEBUG swss#orchagent: :- get: generic get key: SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000, fields: 1
Jul 23 02:03:40.814103 sw100 DEBUG swss#orchagent: :> canSkipRecording: enter
Jul 23 02:03:40.814129 sw100 DEBUG swss#orchagent: :< canSkipRecording: exit
Jul 23 02:03:40.814129 sw100 DEBUG swss#orchagent: :> set: enter
Jul 23 02:03:40.814141 sw100 DEBUG swss#orchagent: :< set: exit
Jul 23 02:03:40.814141 sw100 DEBUG swss#orchagent: :> waitForGetResponse: enter
Jul 23 02:03:40.814159 sw100 DEBUG swss#orchagent: :> wait: enter
Jul 23 02:03:40.814159 sw100 INFO swss#orchagent: :- wait: wait for getresponse response
Jul 23 02:03:40.814175 sw100 DEBUG swss#orchagent: :> select: enter
Jul 23 02:04:40.864367 sw100 DEBUG swss#orchagent: :< select: exit
Jul 23 02:04:40.864367 sw100 DEBUG swss#orchagent: :> getSelectResultAsString: enter
Jul 23 02:04:40.864367 sw100 DEBUG swss#orchagent: :< getSelectResultAsString: exit
That OID seems to exist:
"ASIC_STATE:SAI_OBJECT_TYPE_SWITCH:oid:0x21000000000000": {
"expireat": 1595470146.3998601,
"ttl": -0.001,
"type": "hash",
"value": {
"NULL": "NULL",
"SAI_SWITCH_ATTR_ECMP_DEFAULT_HASH_SEED": "10",
"SAI_SWITCH_ATTR_FDB_AGING_TIME": "600",
"SAI_SWITCH_ATTR_FDB_EVENT_NOTIFY": "0x5622d9986350",
"SAI_SWITCH_ATTR_INIT_SWITCH": "true",
"SAI_SWITCH_ATTR_LAG_DEFAULT_HASH_SEED": "10",
"SAI_SWITCH_ATTR_PORT_STATE_CHANGE_NOTIFY": "0x5622d9986360",
"SAI_SWITCH_ATTR_SRC_MAC_ADDRESS": "44:4C:A8:CB:7E:2C",
"SAI_SWITCH_ATTR_SWITCH_SHUTDOWN_REQUEST_NOTIFY": "0x5622d9986370"
}
},
Here's a larger snippet from the orchagent logs:
Oh for crying out loud.
Fixed.
The root filesystem was full, largely with 2 Sonic images. Once I removed the older of the two and rebooted everything seems to have popped up fine. I see neighbor entries, next hops, and a route to 10.0.0.101.
Apparently I need a bigger boot flash device for my 7060CX.
I rebooted the switch, and routes stopped propagating again. I'm still seeing "wait for getresponse response", and a bunch of queued routes. So it's not disk space. Reopening.
I might be hitting this issue, or something similar - the symptom is that while Linux has the kernel routes installed, there are no SAI objects. I am using BGP for what it is worth.
(vrf:mgmt)bluecmd@adele:~$ ip -6 route show vrf VrfMainframe | grep 2001:678:f40
2001:678:f40::/48 via fe80::209:fff:fe09:d401 dev Ethernet93 proto bgp metric 20 pref medium
(vrf:mgmt)bluecmd@adele:~$ show ipv6 route vrf VrfMainframe | grep 2001:678:f40
B>* 2001:678:f40::/48 [20/0] via fe80::209:fff:fe09:d401, tenGigE1/24/2, weight 1, 03:35:25
(vrf:mgmt)bluecmd@adele:~$ redis-cli -n 0 'keys' '*' | grep ROUTE_TABLE | grep f40
ROUTE_TABLE:VrfMainframe:2001:678:f40::/48
ROUTE_TABLE:VrfKamel:2001:678:f40::/48
(vrf:mgmt)bluecmd@adele:~$ redis-cli -n 1 'keys' '*ROUTE_ENTRY*' | grep 2001:678:f40
(vrf:mgmt)bluecmd@adele:~$ bcmcmd -v 'l3 ip6route show' | grep 2001:
(vrf:mgmt)bluecmd@adele:~$
If I manually insert the L3 route using l3 ip6route add VRF=3 IP=2001:678:f40:: masklen=48 mac=00:09:0f:09:d4:01 intf=100007
things start working as expected.
EDIT: Enabling INFO logging (DEBUG didn't do much) on orchagent reveals these error messages:
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::209:fff:fe09:d401@Ethernet92 for 2001:678:f40::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::250:56ff:fe97:f36d@Ethernet92 for 2a0d:1a45:beef::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::74c0:f8ff:fe4b:c85d@Ethernet92 for 2a10:11c0::/32, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::209:fff:fe09:d401@Ethernet93 for 2001:678:f40::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::250:56ff:fe97:f36d@Ethernet93 for 2a0d:1a45:beef::/48, resolving neighbor
Aug 12 19:28:05.577291 adele INFO swss#orchagent: :- addRoute: Failed to get next hop fe80::209:fff:fe09:bc07@Vlan511 for 2a10:11c0::/32, resolving neighbor
EDIT 2: I filed https://github.com/Azure/sonic-swss/issues/1872
Description
I'm having problems with my Sonic switches failing to route traffic as expected.
Network layout (simplified):
A <=> B (Sonic) <=> C
All routes appear to be correct across A, B, and C. I'm able to ping A and C from B, and B from A and C. However, I cannot ping A from C or vice-versa. In the past, rebooting B has frequently fixed the problem, but new routes haven't consistently worked until rebooted.
I believe I'm hitting a bug in swss, where routes aren't being propagated from the kernel into the database correctly.
As an example, I'm currently having problems with a route to 10.0.0.101. The kernel knows about it:
However, I don't see a
ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY
for this in the DB:The only entry in DB 1 that matches 10.0.0.101 is a block inside of an
ASIC_STATE_KEY_VALUE_OP_QUEUE
entry:A quick glance at the orchagent code didn't immediately make it clear exactly what was going on here, but it looks like it's put some work to be done into a queue, and then never actually processes the queue and adds routes correctly.
Steps to reproduce the issue:
Describe the results you received:
I got a giant
ASIC_STATE_KEY_VALUE_OP_QUEUE
blob.Describe the results you expected:
I expected to see ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY records.
Additional information you deem important (e.g. issue happens only occasionally):
I have two switches, an Arista 7060CX and an Edge-core AS7326. I've seen intermittent behavior similar to this back at least to build 310, but didn't look at redis until today. It happens very consistently with the 7060CX on build 346.
Output of
show version
:Attach debug file
sudo generate_dump
:sonic_dump_sw100_20200714_205053.tar.gz