sonic-net / sonic-platform-daemons

Platform module daemons for SONiC
Other
25 stars 159 forks source link

[chassis][pmon] Fix the PMON traceback issue while chassis is rebooting #558

Closed mlok-nokia closed 1 week ago

mlok-nokia commented 1 week ago

Description

During chassis reboot, PMON on the LC could be in the middle of running module_db_update() which may try to access the CHASSIS_STATE_DB on the SUP to get the ASIC info. If the ASIC table info on SUP has been removed due to reboot, the dictionary could be empty. Directly check its value could cause KeyErr. This PR add code to checks if key exists before using its value. Fixes https://github.com/sonic-net/sonic-buildimage/issues/20543

Motivation and Context

This is timing related issue in the chassis setup. Per discussion, we should always check if the key exists in the dictionary. This PR add code to check if the key CHASSIS_MODULE_INFO_NAME_FIELD is in fvs before checking its value. This will avoid the traceback occurs. Fixes https://github.com/sonic-net/sonic-buildimage/issues/20543

This is needed by 202405 branch

How Has This Been Tested?

This issue is not easy to reproduce. This issue was trigger by "sudo reboot" on SUP. For my unit test, I modified PMON code to trigger the code to run and verify the change.

Which release branch to backport (provide reason below if selected)

Additional Information (Optional)

mlok-nokia commented 1 week ago

@gechiang and @arlakshm This PR is to address the PMON traceback issue on "sudo reboot". Please review it

judyjoseph commented 1 week ago

@gechiang could you pls add an MSFT ADO as well.

gechiang commented 1 week ago

@gechiang could you pls add an MSFT ADO as well.

MSFT ADO: 30223842

gechiang commented 1 week ago

@BYGX-wcr to patch this to 202205 chassis branch