Open stepanblyschak opened 1 year ago
discovery must be done right after switch creation to see what objects exists, cannot be delayed later
Hi @kcudnik,
We are working now on optimizations for fast-reboot flow for switches with high number of ports. We saw that for 256 ports SAI discover for each port consumes more than 8 seconds where in this time orchagent is idle and waiting syncd to finish creating ports. Is SAI discover on post port creation required in fast-reboot init flow? In fast-reboot flow there is no comparison logic since current view is empty. (https://github.com/sonic-net/SONiC/blob/4ab89a9fdba3ced17f4e4d7f97892f93045905d1/doc/fast-reboot/Fast-reboot_Flow_Improvements_HLD.md#42-syncd-point-of-view---initapply-view-framework) We tried skipping SAI discover that follows ports creation, in fast-reboot flow (run the community fast-reboot test multiple times) on Nvidia platforms and at least in that case we saw that this saved 6.5~ seconds of dataplane down time which is more than 20% of the allowed disruption length. As well system was stable and no issues observed.
Description
Steps to reproduce the issue:
Observe that after
create_switch()
SAI discovery process runs and takes (in this case 1.02 sec):Describe the results you received:
SAI discover process took 1.02 sec, but we have seen different results for different platforms/configurations (up to 4 sec).
Describe the results you expected:
From fast/warm reboot design standpoint performing a lot of GET operations in the middle of switch booting delays the replay of configuration. Syncd could blindly replay the configuration as fast as possible and then discover default objects afterwards.
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
sonic_dump_r-panther-13_20230210_114940.tar.gz