Open loshihyu opened 6 years ago
As per the current design, orchagent crashes when there is a "table full" error. This is the expectation. However, in this case, looks like the crm available count is returning an incorrect value (4 instead of 0). The available count is returned by the SAI vendor based on their table size. If this is consistently happening, we would need to take this with Broadcom.
Any plan to not to crash orchagent in this "table full" error? And we will work with Broadcom if incorrect crm available count becomes an issue to us. Thanks for your prompt follow-up Sunny!
Not planned for any immediate release!
Ok, Sunny, could you help to add this as a soon-to-fix critical issue and let us know when the fix will be available? SONiC no longer works after orchagent crashes, and it has critical impact on users. Thanks!
please enable the alpm so that you are going to hit routing table issue in the near future.
We are testing SONiC to add 8k+ ipv4 routes in Broadcom BCM56960 switch. CRM showed 8192 ipv4_routes available. We could add up to 8188. Then, when adding the 8189th route, we hit following syslog errors, see "* syslog:" below, SAI_STATUS_TABLE_FULL, and syncd calls exit_and_notify() to shutdown orchagent running in swss container. We have to do "config reload" or reboot system to recover.
I saw there is one similar issue case opened before: [syncd][topology t0] exit_and_notify after processing the event of SAI_STATUS_TABLE_FULL #654 https://github.com/Azure/sonic-mgmt/issues/654
Is any way to prevent to hit this condition, e.g. SONiC code RouterOrch::addRoute checks available routes before actually adding a route? Looks like SAI_STATUS_TABLE_FULL and shutting down orchagent would apply on all resources listed in crm when more than allowed resources are used, e.g. ipv6_route, ipv4_neighbor, etc., see below. Any plan to enhance and avoid shutting down orchagent in this SAI_STATUS_TABLE_FULL case?
Thanks!
Wilson