sonic-net / sonic-platform-daemons

Platform module daemons for SONiC
Other
23 stars 152 forks source link

[chassis][linecard] Fix Module LINECARD<> went off-line message for empty slot issue #462

Closed mlok-nokia closed 6 months ago

mlok-nokia commented 6 months ago

Description

The current implementation in chassisd has NO method to differentiates a module (Linecard & Farbic card) slot is empty during boot up or module is reseatted or removal. Add get_module_current_status() to fetch the previous status from DB. And compare it to the new status to log a proper message.

After this change, the following is behaviours: 1) No longer log "Module LINE-CARD5 went off-line!" for empty slot upon Supervisor reboot 2) There be any LINECARD Chassis db cleanup happened after the 30 minutes timeout 3) The following messages will be logged for a Linecard which is online upon Supervisor reboot

Apr  9 19:05:43.028168 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module SUPERVISOR0 is on-line!
Apr  9 19:05:43.233794 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module LINE-CARD0 is on-line!
Apr  9 19:05:43.848048 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module LINE-CARD3 is on-line!
Apr  9 19:05:45.686618 ixre-cpm-chassis15 NOTICE pmon#chassisd: Module FABRIC-CARD3 is on-line!

Motivation and Context

On a T2-VOQ chassis, where some linecard slots are empty. Upon supervisor reboot, following messages are seen for those empty linecard slots.

pmon#chassisd: Module LINE-CARD5 went off-line!
pmon#chassisd: Host name is not available for Module LINE-CARD5. Chassis db clean up not done!
pmon#chassisd: Module LINE-CARD5| is down for long time. Initiating chassis app db clean up

This PR fixes https://github.com/sonic-net/sonic-buildimage/issues/18539

This change is required to 202205 branch

[x] 202205 [x] 202305

How Has This Been Tested?

This change has been tested on 202205 branch. 1) Reboot Supervisor, check the log message for the empty slot and non-empty slot 2) Reboot a linecard, check the proper log message 3) Reboot a linecard and keep it down, to check the log messages and also the 30 minutes CHassis db cleanup happens for this slot

Additional Information (Optional)

mlok-nokia commented 6 months ago

@deepak-singhal0408 @judyjoseph Hi Deepak & Judy, thsi PR address the LINECARD went off-line for empty slot issue. Please review it.

deepak-singhal0408 commented 6 months ago

Also please add a testcase to cover the transition scenario we are handling here. thanks!

mlok-nokia commented 6 months ago

Also please add a testcase to cover the transition scenario we are handling here. thanks!

Done

mlok-nokia commented 6 months ago

@judyjoseph @deepak-singhal0408 PR has been updated based on the comments. Please review it. Thanks.

deepak-singhal0408 commented 6 months ago

MSFT ADO: 27471113 @rlhui @judyjoseph could you please help review/merge the PR.. Thanks!

gechiang commented 6 months ago

@StormLiangMS , @yxieca , please help review/approve for the 202305/202205 branches as this is one of the submodule that we do not have a MSFT repo to cherry-pick and this bug fix is needed for Chassis project on 202205. The MSFT ADO: 27471113 Thanks!

mssonicbld commented 6 months ago

Cherry-pick PR to 202311: https://github.com/sonic-net/sonic-platform-daemons/pull/469

mssonicbld commented 6 months ago

Cherry-pick PR to 202205: https://github.com/sonic-net/sonic-platform-daemons/pull/470

mssonicbld commented 5 months ago

Cherry-pick PR to 202305: https://github.com/sonic-net/sonic-platform-daemons/pull/476