sonic-net / sonic-sairedis

SAI object interface to Redis database, as used in the SONiC project
Other
56 stars 259 forks source link

Warmboot: It will remain warm removeswitch when first reboot after warm-reboot #1361

Closed copyandrun closed 5 months ago

copyandrun commented 5 months ago

When first warm-reboot, the warm will be set by setRestartWarmOnAllSwitches(true). Then the syncd will be start with warm true. But there is no false set to SAI after warm get reconciled. So the next reboot will call warm SAI removeswitch. I think it should be a false set to SAI after warm-reboot.

kcudnik commented 5 months ago

not sure about this scenario, i believe when first warm reboot is performed, this flag is auromatically cleaned, but i'm not sure +@yxieca

copyandrun commented 5 months ago

I am sure this is warm because i add some log in the warm remove SAI code. When the first warm is down, i reboot to check it with warm remove

yxieca commented 5 months ago

I agree with Kamil. Next shutdown won't have warm reboot flag set. SAI shouldn't persist the warm shutdown or recover flag.

copyandrun commented 5 months ago

I agree with Kamil. Next shutdown won't have warm reboot flag set. SAI shouldn't persist the warm shutdown or recover flag.

So it should just be used when remove not create. But the SAI create switch keep this flag true when warmboot. Should it be explicitly set to false by syncd instead of SAI?

kcudnik commented 5 months ago

What do you mean that sai create switch keep this true? In what sense ?

copyandrun commented 5 months ago

What do you mean that sai create switch keep this true? In what sense ?

In brcm_sai_create_switch, this value will be set _brcm_sai_global_data_set(_BRCM_SAI_WARM_SHUT, &gdata); Then it will be used here. brcm_sai_shutdown_switch(In bool warm_restart_hint) I checked that syncd just set it to true when shutdowntype == warm. Do you think it need to force writing false when cold?

kcudnik commented 5 months ago

from SAI or sonic persprctive we don't look int what vendor is doing inside, for us that should not be concearn

is current implementaion causing some problems for you ? at which point you want to set this to false?

copyandrun commented 5 months ago

from SAI or sonic persprctive we don't look int what vendor is doing inside, for us that should not be concearn

is current implementaion causing some problems for you ? at which point you want to set this to false?

In syncd run shutdown image As you can see the warm set to true, maybe add else if cold set to false in front of the removeswitch? I guess it can work.

kcudnik commented 5 months ago

I don't get the problem the flag should be set to true only if warm restart is requested not in else case which is not present here, can you describe step by step scenario what you want to achieve here ? I have hard time to understand this since so far many many times our scenario works fine in production and we don't have any issues

copyandrun commented 5 months ago

I don't get the problem the flag should be set to true only if warm restart is requested not in else case which is not present here, can you describe step by step scenario what you want to achieve here ? I have hard time to understand this since so far many many times our scenario works fine in production and we don't have any issues

Step

  1. warm-reboot
  2. device get ready
  3. reboot or stop syncd The problem is that in step 3 the SAI take it as warm remove_switch. But it should be cold.

I think step 1 set the warm flag to SAI in the code picture. And then SAI remove swtich and create switch since it is warm recovery. Then the warm is finished, but no one set the flag to false . So now stop syncd or reboot will make the SAI take it as warm remove_switch. I got your point that the SAI should not keep the flag. I agree with you. But now the bcm SAI is not work as we expected. So i think we can just set it to false when syncd ask to cold remove_switch.

kcudnik commented 5 months ago

If you reboot or stop sync you should do that gracefully ether by request shutdown or by OA which will set type of shutdown whether cold or warm or fast and it should be fine

kcudnik commented 5 months ago

Even if after warm reboot warm flag persist on sai then OA request of reboot will force new flag for example cold or boot, so of switch think that is in warm shutdown for next reboot it does not matter since OA will force to override that

copyandrun commented 5 months ago

Even if after warm reboot warm flag persist on sai then OA request of reboot will force new flag for example cold or boot, so of switch think that is in warm shutdown for next reboot it does not matter since OA will force to override that

The situation I encountered was not being forced to write cold. Where is the override process code in OA?(I am sorry that i have not find that.) (Just run the reboot command is what i did. Is it gracefully?)

kcudnik commented 5 months ago

yes, reboot is fine, should be fine, you mean reboot command as linux comman ? that would trigger cold boot and

copyandrun commented 5 months ago

yes, reboot is fine, should be fine, you mean reboot command as linux comman ? that would trigger cold boot and

SONiC reboot command

kcudnik commented 5 months ago

this is ok, do you see some negative issues with your scenario ?

copyandrun commented 5 months ago

this is ok, do you see some negative issues with your scenario ?

The problem is that the last reboot goes with warm remove_switch. This will ignore the cold table remove code i added in the brcm SAI. What i suggest is add the force set cold in syncd. I tested and i think i have solved it in my scenario. Thank you.

kcudnik commented 5 months ago

Do you have pr ?

copyandrun commented 5 months ago

Do you have pr ?

I think it just an issue in my scenario with brcm SAI change not SONiC issue. I will just close this issue. Thank you for your reply.