sonic-net / sonic-platform-daemons

Platform module daemons for SONiC
Other
25 stars 159 forks source link

Update TRANSCIEVER_INFO table for ports that doesn't exist in Config DB #556

Open noaOrMlnx opened 2 weeks ago

noaOrMlnx commented 2 weeks ago

Description

This change https://github.com/sonic-net/sonic-platform-daemons/blob/0cb36447ebfda390d999df6516beac4c17315a7b/sonic-xcvrd/xcvrd/xcvrd.py#L2199 , caused TRANSCEIVER_INFO table to remain after config reload.

When TRANSCEIVER_INFO table is not deleted from State DB, we have old ports there. if we do split to 8 in SPC4, we must deleted the adjacent port. so we have all old ports in TRANSCIVER_INFO but not in Config DB.

To not cause any crash when trying to search for non-exist tables and ports, we should delete all ports which are not in Config DB from TRANSCEIVER_INFO table.

example to a possible crash - image

Motivation and Context

Motivation is to be able to use cmis host mgmt feature in case we split to 8 and delete some ports from Config DB.

How Has This Been Tested?

with a split to 8 port

Additional Information (Optional)

prgeor commented 2 weeks ago

@noaOrMlnx How did we land up in this situation where TRANSCEIVER_INFO table exists for the ports NOT in CONFIG_DB ?

prgeor commented 2 weeks ago

@noaOrMlnx What crash are we talking about? backtrace please?

noaOrMlnx commented 1 week ago

@noaOrMlnx How did we land up in this situation where TRANSCEIVER_INFO table exists for the ports NOT in CONFIG_DB ?

@prgeor please check following code - https://github.com/sonic-net/sonic-platform-daemons/blob/0cb36447ebfda390d999df6516beac4c17315a7b/sonic-xcvrd/xcvrd/xcvrd.py#L2199

MSFT didn't want 'config reload' to remove TRANSCEIVER_INFO table, as it retriggers host tx ready setting.

noaOrMlnx commented 1 week ago

@noaOrMlnx Can you please explain why on_remove_logical_port is not taking care of deleting the TRANSCEIEVR_INFO table for the relevant ports?

@mihirpat1 on_remove_logical_port function is triggered when a port is removed on run time. If user removes port from config DB and do 'config reload' it will not be removed from TRANSCEIVER_INFO table.

liat-grozovik commented 1 week ago

@mihirpat1 please let me know of any further questions otherwise please approve so i can merge.

mihirpat1 commented 1 week ago

@mihirpat1 please let me know of any further questions otherwise please approve so i can merge.

@noaOrMlnx Can you please help in addressing https://github.com/sonic-net/sonic-platform-daemons/pull/556#discussion_r1833250026

Also, can you please add the traceback of crash to the description of the PR?

noaOrMlnx commented 1 week ago

@mihirpat1 @prgeor all comments were addressed

mihirpat1 commented 1 week ago

@noaOrMlnx Please help in fixing the build failure

noaOrMlnx commented 1 week ago

@noaOrMlnx Please help in fixing the build failure

@mihirpat1 Done

prgeor commented 1 week ago

@noaOrMlnx Can you please explain why on_remove_logical_port is not taking care of deleting the TRANSCEIEVR_INFO table for the relevant ports?

@mihirpat1 on_remove_logical_port function is triggered when a port is removed on run time. If user removes port from config DB and do 'config reload' it will not be removed from TRANSCEIVER_INFO table.

@noaOrMlnx what do you mean by "If user removes port from config DB" ? What is this usecase?

prgeor commented 1 week ago

@noaOrMlnx please share the steps to reproduce this issue