sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

KVM Switch: Failed to Detect Redis.Sock Connection Failures #5323

Open tahmed-dev opened 4 years ago

tahmed-dev commented 4 years ago

This is clone of Azure/sonic-buildimage#5277 and is opened in order to track why KVM switch did not detect redis.sock connection failures.

Details copied from issue:5277

Description The sonic-cfggen utility fails to connect to /var/run/redis/redis.sock:

sudo systemctl restart bgp

admin@sonic:~$ sudo cat /var/log/syslog | grep -B 30 -A 5 /var/run/redis/redis.sock
Aug 20 10:55:05.963130 sonic NOTICE admin: Stopped bgp service...
Aug 20 10:55:05.967556 sonic INFO systemd[1]: bgp.service: Succeeded.
Aug 20 10:55:05.969013 sonic INFO systemd[1]: Stopped BGP container.
Aug 20 10:55:05.972528 sonic INFO systemd[1]: Starting BGP container...
Aug 20 10:55:05.979468 sonic NOTICE admin: Starting bgp service...
Aug 20 10:55:06.231106 sonic NOTICE admin: Warm boot flag: bgp false.
Aug 20 10:55:06.238290 sonic NOTICE admin: Fast boot flag: bgp .
Aug 20 10:55:06.734560 sonic INFO bgp.sh[28088]: Traceback (most recent call last):
Aug 20 10:55:06.734827 sonic INFO bgp.sh[28088]:   File "/usr/local/bin/sonic-cfggen", line 416, in <module>
Aug 20 10:55:06.735145 sonic INFO bgp.sh[28088]:     main()
Aug 20 10:55:06.735326 sonic INFO bgp.sh[28088]:   File "/usr/local/bin/sonic-cfggen", line 343, in main
Aug 20 10:55:06.735498 sonic INFO bgp.sh[28088]:     configdb.connect()
Aug 20 10:55:06.735671 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/configdb.py", line 74, in connect
Aug 20 10:55:06.735854 sonic INFO bgp.sh[28088]:     self.db_connect('CONFIG_DB', wait_for_init, retry_on)
Aug 20 10:55:06.736030 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/configdb.py", line 69, in db_connect
Aug 20 10:55:06.736202 sonic INFO bgp.sh[28088]:     SonicV2Connector.connect(self, self.db_name, retry_on)
Aug 20 10:55:06.736377 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/dbconnector.py", line 250, in connect
Aug 20 10:55:06.736549 sonic INFO bgp.sh[28088]:     self.dbintf.connect(db_id, retry_on)
Aug 20 10:55:06.736778 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 171, in connect
Aug 20 10:55:06.736941 sonic INFO bgp.sh[28088]:     self._onetime_connect(db_id)
Aug 20 10:55:06.737094 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 183, in _onetime_connect
Aug 20 10:55:06.737249 sonic INFO bgp.sh[28088]:     client.config_set('notify-keyspace-events', self.KEYSPACE_EVENTS)
Aug 20 10:55:06.737408 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 1243, in config_set
Aug 20 10:55:06.737562 sonic INFO bgp.sh[28088]:     return self.execute_command('CONFIG SET', name, value)
Aug 20 10:55:06.737714 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 898, in execute_command
Aug 20 10:55:06.737872 sonic INFO bgp.sh[28088]:     conn = self.connection or pool.get_connection(command_name, **options)
Aug 20 10:55:06.738027 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 1192, in get_connection
Aug 20 10:55:06.738180 sonic INFO bgp.sh[28088]:     connection.connect()
Aug 20 10:55:06.738334 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 563, in connect
Aug 20 10:55:06.738487 sonic INFO bgp.sh[28088]:     raise ConnectionError(self._error_message(e))
Aug 20 10:55:06.738642 sonic INFO bgp.sh[28088]: redis.exceptions.ConnectionError: Error 13 connecting to unix socket: /var/run/redis/redis.sock. Permission denied.
Aug 20 10:55:06.836439 sonic INFO bgp.sh[28088]: Removing obsolete bgp container with HWSKU montara
Aug 20 10:55:06.902063 sonic INFO bgp.sh[28088]: bgp
Aug 20 10:55:06.905879 sonic INFO bgp.sh[28088]: Creating new bgp container with HWSKU
Aug 20 10:55:07.085741 sonic INFO bgp.sh[28088]: a9f9487d078ec6ae0d3ca4c793bcfa41975d0c50a7decf48331f50a162d0e438
Aug 20 10:55:07.203221 sonic INFO containerd[505]: time="2020-08-20T10:55:07.202160280Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/a9f9487d078ec6ae0d3ca4c793bcfa41975d0c50a7decf48331f50a162d0e438/shim.sock" debug=false pid=28176
admin@sonic:~$ 

The issue flow is:

bgp.service ->  /usr/local/bin/bgp.sh ->  /usr/bin/bgp.sh ->  start() ->   HWSKU=${HWSKU:-`$SONIC_CFGGEN -d -v 'DEVICE_METADATA["localhost"]["hwsku"]'`}

Probably because User=admin in bgp.service:

[Service]
User=admin
ExecStartPre=/usr/local/bin/bgp.sh start

Steps to reproduce the issue:

  1. sudo systemctl restart bgp
  2. sudo cat /var/log/syslog | grep -B 30 -A 5 /var/run/redis/redis.sock

Describe the results you received: The sonic-cfggen utility fails to connect to /var/run/redis/redis.sock. The errors in logs are generated.

Describe the results you expected: No errors in logs

Additional information you deem important (e.g. issue happens only occasionally): Environment:

SONiC Software Version: SONiC.HEAD.740-dirty-20200813.174118
Distribution: Debian 10.5
Kernel: 4.19.0-9-2-amd64
Build commit: 9c22d19b

Platform: x86_64-arista_7170_64c
HwSKU: Arista-7170-64C
ASIC: barefoot

The issue has been introduced by https://github.com/Azure/sonic-buildimage/pull/4941/files?diff=unified&w=1#diff-2901ad8ea8e7ba16059aa09588944b30L300

The following PR may cause the same issue in SONiC 201911: https://github.com/Azure/sonic-buildimage/pull/5200

anshuv-mfst commented 4 years ago

@tahmed-dev - could you please confirm fix and close the issue.

tahmed-dev commented 4 years ago

@anshuv-mfst fix for original issue:5277 is in master. This is to track why the issue was not seen KVM based PR checker.