Open anamehra opened 1 year ago
@abdosi , FYI.
@arlakshm can you take a look
HI @anamehra, this is expected behavior if the database-chassis is not running then any process trying to write to chassis-db will exit.
HI @anamehra, this is expected behavior if the database-chassis is not running then any process trying to write to chassis-db will exit.
Hi @arlakshm , this caused a chassisd core and failed a sonic-mgmt test case. How should we handle this in sonic-mgmt? In one scenario we saw that the chassisd on LC keeps restarting while the chassis redis server was down and entered FATAL state as it keeps exiting too soon.
@arlakshm To handle this use case, can we explore the option of handling the redis error, and retry connection with a defined retry count/timeout?
@anamehra is this issue still applicable ? What is latest on this ?
Description
chassisd process on LC crahses when database-chassis goes down on supervisor as part of sonic-mgmt tests like restart docker.service, Sup watchdog reboot, etc. This generates a core Jun 14 21:07:19.267491 sfd-lt2-lc0 INFO pmon#supervisord: chassisd Traceback (most recent call last): Jun 14 21:07:19.267514 sfd-lt2-lc0 INFO pmon#supervisord: chassisd File "/usr/local/bin/chassisd", line 471, in
Jun 14 21:07:19.267514 sfd-lt2-lc0 INFO pmon#supervisord: chassisd main()
Jun 14 21:07:19.267514 sfd-lt2-lc0 INFO pmon#supervisord: chassisd File "/usr/local/bin/chassisd", line 466, in main
Jun 14 21:07:19.267525 sfd-lt2-lc0 INFO pmon#supervisord: chassisd chassisd.run()
Jun 14 21:07:19.267532 sfd-lt2-lc0 INFO pmon#supervisord: chassisd File "/usr/local/bin/chassisd", line 445, in run
Jun 14 21:07:19.267532 sfd-lt2-lc0 INFO pmon#supervisord: chassisd self.module_updater.module_db_update()
Jun 14 21:07:19.267549 sfd-lt2-lc0 INFO pmon#supervisord: chassisd File "/usr/local/bin/chassisd", line 264, in module_db_update
Jun 14 21:07:19.267549 sfd-lt2-lc0 INFO pmon#supervisord: chassisd self.asic_table.set(asic_key, asic_fvs)
Jun 14 21:07:19.267568 sfd-lt2-lc0 INFO pmon#supervisord: chassisd File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 2237, in set
Jun 14 21:07:19.267833 sfd-lt2-lc0 INFO pmon#supervisord: chassisd return _swsscommon.Table_set(self, *args)
Jun 14 21:07:19.267833 sfd-lt2-lc0 INFO pmon#supervisord: chassisd RuntimeError: RedisError: Failed to redisGetReply in RedisPipeline::pop, err=1: errstr=Connection reset by peer
Jun 14 21:07:19.275527 sfd-lt2-lc0 INFO pmon#supervisord: chassisd terminate called after throwing an instance of 'swss::RedisError'
Jun 14 21:07:19.275527 sfd-lt2-lc0 INFO pmon#supervisord: chassisd what(): RedisError: Failed to redisGetReply in RedisPipeline::pop, err=1: errstr=Connection reset by peer
Jun 14 21:07:19.275675 sfd-lt2-lc0 INFO pmon#supervisord: chassisd
Jun 14 21:07:19.632037 sfd-lt2-lc0 INFO pmon#supervisord 2023-06-14 21:07:19,631 INFO exited: chassisd (terminated by SIGABRT (core dumped); not expected)
Jun 14 21:07:20.333821 sfd-lt2-lc0 INFO bgp2#supervisord 2023-06-14 21:07:20,333 INFO waiting for supervisor-proc-exit-listener, rsyslogd, staticd, zebra, bgpd, bgpcfgd to die
Jun 14 21:07:20.633609 sfd-lt2-lc0 INFO pmon#supervisord 2023-06-14 21:07:20,633 INFO spawned: 'chassisd' with pid 447
Steps to reproduce the issue:
1. 2. 3.
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):