sonic-net / sonic-snmpagent

A net-snmpd agentx subagent for SONiC
Other
15 stars 113 forks source link

Improve MIBUpdater to re-connect DBConnector when re-init data. #290

Closed liuh-80 closed 1 year ago

liuh-80 commented 1 year ago

Improve MIBUpdater to re-connect DBConnector when re-init data.

Work item tracking

Microsoft ADO (number only): 24705208

- What I did Fix when redis restart, some MIBUpdater's db connection will broken and keeps report error to syslog issue.

There will be following message repeat in syslog:

#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/ietf/rfc2737.py", line 674, in _update_per_namespace_data#
#012    msg = pubsub.get_message()#
#012  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1626, in get_message#
#012    return _swsscommon.PubSub_get_message(self, timeout)#
#012RuntimeError: RedisError: Failed to select, err=3: errstr=Server closed the connection

- How I did it Re-connect DBConnector in every MIBUpdater's reinit_data method.

- How to verify it Pass all UT

Manually test with following steps:

  1. in database container kill redis server.
  2. start redis in database container later: service redis-server start
  3. check syslog, confirm following log exist but now new log few minutes later after mibs re-init:

sudo cat /var/log/syslog | grep rfc2737.py | grep PubSub_get_message

Sep 22 03:30:17.223944 vlab-01 ERR snmp#snmp-subagent [ax_interface] ERROR: MIBUpdater.start() caught an unexpected exception during update_data()#012Traceback (most recent call last):#012 File "/usr/local/lib/python3.9/dist-packages/ax_interface/mib.py", line 43, in start#012 self.update_data()#012 File "/usr/local/lib/python3.9/dist-packages/sonic_ax_impl/mibs/ietf/rfc2737.py", line 326, in update_data#012 updater.update_data(i, self.statedb[i])#012 File "/usr/local/lib/python3.9/dist-packages/sonic_ax_impl/mibs/ietf/rfc2737.py", line 666, in update_data#012 self._update_per_namespace_data(self.pub_sub_dict[db_index])#012 File "/usr/local/lib/python3.9/dist-packages/sonic_ax_impl/mibs/ietf/rfc2737.py", line 675, in _update_per_namespace_data#012 msg = pubsub.get_message()#012 File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1996, in get_message#012 return _swsscommon.PubSub_get_message(self, timeout, interrupt_on_signal)#012RuntimeError: RedisError: Failed to select, err=3: errstr=Server closed the connection

- Description for the changelog Improve MIBUpdater to re-connect DBConnector when re-init data.

liuh-80 commented 1 year ago

Pipeline build break, already create a fix PR. pending for #293 merge first

liuh-80 commented 1 year ago

/azp run

azure-pipelines[bot] commented 1 year ago
Azure Pipelines successfully started running 1 pipeline(s).
qiluo-msft commented 1 year ago
        msg = pubsub.get_message()

The root issue is pubsub throw exception and could not auto recover. Could you repro the bug report (please add bug report in PR description)? and they verify your fix with the repro steps? #Closed


Refers to: src/sonic_ax_impl/mibs/ietf/rfc2737.py:676 in bbf45ab. [](commit_id = bbf45ab0763ae0342e9b39e50198396964e3d12e, deletion_comment = False)

liuh-80 commented 1 year ago
        msg = pubsub.get_message()

The root issue is pubsub throw exception and could not auto recover. Could you repro the bug report (please add bug report in PR description)? and they verify your fix with the repro steps?

Refers to: src/sonic_ax_impl/mibs/ietf/rfc2737.py:676 in bbf45ab. [](commit_id = bbf45ab, deletion_comment = False)

PR description updated, add repro and verify steps.