sonic-net / sonic-platform-daemons

Platform module daemons for SONiC
Other
23 stars 152 forks source link

[chassis][pmon][chassid] Enhance the chassid module on-line or off-line log messages with physical slot number #530

Closed mlok-nokia closed 2 months ago

mlok-nokia commented 2 months ago

Description

Per MSFT request, we enhance the module on-line and off-line messages with physical slot number

Before enhance: pmon#chassisd: Module LINE-CARD0 is on-line! pmon#chassisd: Module LINE-CARD0 went off-line! pmon#chassisd: Module LINE-CARD0 recovered on-line! pmon#chassisd: Module FABRIC-CARD6 is on-line!

After Enhance: pmon#chassisd: Module LINE-CARD0 (Slot 1) is on-line! pmon#chassisd: Module LINE-CARD0 (Slot 1) went off-line! pmon#chassisd: Module LINE-CARD0 (Slot 1) recovered on-line! pmon#chassisd: Module FABRIC-CARD6 (Slot 7) is on-line!

This change is required by 202205 branch also.

Motivation and Context

Per MSFT's request -- In order to easily identify a module within a chassis, we need to report the physical slot number whenever we raised “module” related syslogs.

How Has This Been Tested?

Tested with Master branch and 202205 branch.

Additional Information (Optional)

mlok-nokia commented 2 months ago

@gechiang @judyjoseph This PR enhances the chassis module on-line and off-line message with physical slot info. Please review it. Thanks

mlok-nokia commented 2 months ago

@gechiang @judyjoseph I just found this PR https://github.com/sonic-net/sonic-platform-daemons/pull/480 have not be merged to 202205 branch while other PRs which are associated with PR https://github.com/sonic-net/sonic-platform-daemons/pull/480 have been merged to 202205. Please merge this PRs first.

gechiang commented 2 months ago

@gechiang @judyjoseph I just found this PR #480 have not be merged to 202205 branch while other PRs which are associated with PR #480 have been merged to 202205. Please merge this PRs first.

@mlok-nokia the PR you quoted is no longer allowed for public 202205 branch for this submodule as it is a feature enhancement and not a bug fix and 202205 release owner stopped accepting changes. We do have internal build that takes it in as a patch already so we are good.

mlok-nokia commented 2 months ago

Can you also add the same change for the 4 syslog related to Midplane connectivity issues at line 455, 457, 459, and 465. Also on line 556 for module down long time please also add the slot info as well. Thanks!

@gechiang I just made the change and tested it. Please review it again.

mlok-nokia commented 2 months ago

@gechiang @judyjoseph I just found this PR #480 have not be merged to 202205 branch while other PRs which are associated with PR #480 have been merged to 202205. Please merge this PRs first.

@mlok-nokia the PR you quoted is no longer allowed for public 202205 branch for this submodule as it is a feature enhancement and not a bug fix and 202205 release owner stopped accepting changes. We do have internal build that takes it in as a patch already so we are good.

Ok. Thanks.

mlok-nokia commented 2 months ago

@gechiang Do we need this PR for 202205 branch?

rlhui commented 2 months ago

can we also have corresponding test for this?

mlok-nokia commented 2 months ago

can we also have corresponding test for this?

There are already testcases. But not for checking logging message.

gechiang commented 2 months ago

@gechiang Do we need this PR for 202205 branch?

yes. but we will do it internally as officially public 202205 is closed... So once we have this merged, we will pick it up in our next internal build. For community who also needs this for testing on 202205 MSFT repo will also have to manually patch it in their own internal 202205 based builds...

arlakshm commented 2 months ago

@kenneth-arista for viz..

arlakshm commented 2 months ago

can we add some unit test?

mlok-nokia commented 2 months ago

can we add some unit test? There are UTs for this section of code. But it is just not able check the log texts. For this module, adding any new line of code requires to add UT for it. Otherwise, it will fail to UT verification during image build.

gechiang commented 2 months ago

All comments seem to have been answered already. Any other comments? @kenneth-arista any comments? Thanks!

gechiang commented 2 months ago

@arlakshm , please help sign off on this PR. @rlhui , please help merge this PR. Thanks!

judyjoseph commented 2 months ago

@mlok-nokia a general comment -- there are a few more places in the chassisd script where we just have the Module {} alone, should we add Slot {} there too for uniform logs -- not critical but @gechiang f.y.i

mlok-nokia commented 2 months ago

@mlok-nokia a general comment -- there are a few more places in the chassisd script where we just have the Module {} alone, should we add Slot {} there too for uniform logs -- not critical but @gechiang f.y.i

It seems like there are only two messages which contains Module {}. I will take a look if it is feasible to add the slot for since one of them complains Module {} with hostname while another is logging for CLI command "config chassis module shutdown "

mssonicbld commented 1 month ago

Cherry-pick PR to 202405: https://github.com/sonic-net/sonic-platform-daemons/pull/534