Open vivekrnv opened 2 years ago
@qiluo-msft, @SuvarnaMeenakshi Please check
Mitigated for now, long term fix may require a new feature: by default, make ALL the blocking calls as False
@qiluo-msft, @SuvarnaMeenakshi kindly reminder to review
The proposed solution seems in good direction. It should not be extreme easy because existing code has some assumption on redis data availability. Would you like to raise a PR on this solution?
We fixed one of the blocking call, but not all. https://github.com/Azure/sonic-snmpagent/pull/255
Description
When blocking=True is used and the data is not available in Redis, the corresponding data-fetching coroutines are eating up time and not giving enough time for the coroutine which maintains the TCP connection to AgentX Socket and thus the connection is getting terminated and eventually causing the failure of SNMP queries.
This SNMP query failure is also reported here: https://github.com/Azure/sonic-buildimage/issues/9996
Triage:
It clearly took 4 mins for the connection_routine to finish TCP handshake, and so the same behavior is expected when the Transport coroutine has to handle and respond to any incoming data. https://github.com/Azure/sonic-snmpagent/blob/master/src/ax_interface/socket_io.py#L149
I've verified this behavior by removing the Updater Instances which are throwing the following exceptions,
and the snmp queries started to work.
Solution:
This PR https://github.com/Azure/sonic-snmpagent/pull/246 fixes the issue temporarily but as a long term solution all the blocking=True arguments in the subagent repo should be avoided.
sonic_dump_qa-eth-vt05-1-2410_20220318_131013 (1).tar.gz