sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

SAI_STATUS_TABLE_FULL and swss:orchagent shutdown #2125

Open loshihyu opened 6 years ago

loshihyu commented 6 years ago

We are testing SONiC to add 8k+ ipv4 routes in Broadcom BCM56960 switch. CRM showed 8192 ipv4_routes available. We could add up to 8188. Then, when adding the 8189th route, we hit following syslog errors, see "* syslog:" below, SAI_STATUS_TABLE_FULL, and syncd calls exit_and_notify() to shutdown orchagent running in swss container. We have to do "config reload" or reboot system to recover.

image

I saw there is one similar issue case opened before: [syncd][topology t0] exit_and_notify after processing the event of SAI_STATUS_TABLE_FULL #654 https://github.com/Azure/sonic-mgmt/issues/654

Is any way to prevent to hit this condition, e.g. SONiC code RouterOrch::addRoute checks available routes before actually adding a route? Looks like SAI_STATUS_TABLE_FULL and shutting down orchagent would apply on all resources listed in crm when more than allowed resources are used, e.g. ipv6_route, ipv4_neighbor, etc., see below. Any plan to enhance and avoid shutting down orchagent in this SAI_STATUS_TABLE_FULL case?

image

Thanks!

Wilson

prsunny commented 6 years ago

As per the current design, orchagent crashes when there is a "table full" error. This is the expectation. However, in this case, looks like the crm available count is returning an incorrect value (4 instead of 0). The available count is returned by the SAI vendor based on their table size. If this is consistently happening, we would need to take this with Broadcom.

loshihyu commented 6 years ago

Any plan to not to crash orchagent in this "table full" error? And we will work with Broadcom if incorrect crm available count becomes an issue to us. Thanks for your prompt follow-up Sunny!

prsunny commented 6 years ago

Not planned for any immediate release!

loshihyu commented 6 years ago

Ok, Sunny, could you help to add this as a soon-to-fix critical issue and let us know when the fix will be available? SONiC no longer works after orchagent crashes, and it has critical impact on users. Thanks!

lguohan commented 6 years ago

please enable the alpm so that you are going to hit routing table issue in the near future.