sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

[monit] Errors in the log related to "telementry", "dialout_client" and "snmp_subagent" services #5529

Closed volodymyrsamotiy closed 5 months ago

volodymyrsamotiy commented 4 years ago

Description Many ansible and pytest tests fail due to "monit" errors found by loganalyzer. Errors are related to services: "telementry", "dialout_client" and "snmp_subagent".

Steps to reproduce the issue:

  1. No steps to reproduce. Just install image SONiC.201911.207-1da60a68 and errors are in the syslog.

Describe the results you received: Monit errors in the log related to "telementry", "dialout_client" and "snmp_subagent" services.

Oct  2 15:30:57.936681 sonic ERR monit[789]: 'telemetry|telemetry' status failed (1) -- '/usr/sbin/telemetry' is not running.
Oct  2 15:30:57.937998 sonic ERR monit[789]: 'telemetry|dialout_client' status failed (1) -- '/usr/sbin/dialout_client_cli' is not running.
Oct  2 15:30:58.158931 sonic ERR monit[789]: 'snmp|snmp_subagent' status failed (1) -- 'python3 -m sonic_ax_impl' is not running.

Describe the results you expected: No "monit" errors should be observed ("loganalyzer" fails tests because of these errors).

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**

```

SONiC Software Version: SONiC.201911.207-1da60a68 Distribution: Debian 9.13 Kernel: 4.9.0-11-2-amd64 Build commit: 1da60a68 Build date: Thu Oct 1 17:26:52 UTC 2020 Built by: johnar@jenkins-worker-8

Platform: x86_64-mlnx_msn3800-r0 HwSKU: ACS-MSN3800 ASIC: mellanox Serial Number: MT1937X00527 Uptime: 16:27:47 up 1:03, 2 users, load average: 1.79, 1.04, 1.00

Docker images: REPOSITORY TAG IMAGE ID SIZE docker-syncd-mlnx 201911.207-1da60a68 af7f48d1abde 397MB docker-syncd-mlnx latest af7f48d1abde 397MB docker-router-advertiser 201911.207-1da60a68 b3cb8d8052c7 290MB docker-router-advertiser latest b3cb8d8052c7 290MB docker-sonic-mgmt-framework 201911.207-1da60a68 1822d324cbeb 431MB docker-sonic-mgmt-framework latest 1822d324cbeb 431MB docker-platform-monitor 201911.207-1da60a68 7dc8fb4b5732 665MB docker-platform-monitor latest 7dc8fb4b5732 665MB docker-fpm-frr 201911.207-1da60a68 aaf7c84baf6a 335MB docker-fpm-frr latest aaf7c84baf6a 335MB docker-sflow 201911.207-1da60a68 7febf5b9d3fd 315MB docker-sflow latest 7febf5b9d3fd 315MB docker-lldp-sv2 201911.207-1da60a68 8bd7df393dee 312MB docker-lldp-sv2 latest 8bd7df393dee 312MB docker-dhcp-relay 201911.207-1da60a68 619af6271449 300MB docker-dhcp-relay latest 619af6271449 300MB docker-database 201911.207-1da60a68 c04143a9991d 290MB docker-database latest c04143a9991d 290MB docker-teamd 201911.207-1da60a68 0f86f2c470f1 315MB docker-teamd latest 0f86f2c470f1 315MB docker-snmp-sv2 201911.207-1da60a68 b60d27c146c6 348MB docker-snmp-sv2 latest b60d27c146c6 348MB docker-orchagent 201911.207-1da60a68 2f30f52a18af 333MB docker-orchagent latest 2f30f52a18af 333MB docker-nat 201911.207-1da60a68 f18bd5e02ef1 317MB docker-nat latest f18bd5e02ef1 317MB docker-sonic-telemetry 201911.207-1da60a68 3f1d8b042c9d 354MB docker-sonic-telemetry latest 3f1d8b042c9d 354MB

```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
liat-grozovik commented 4 years ago

Note: all tests which are wrapped by log analyzer to make sure we see these errors are failing and it is hard to validate any new image from 201911 I dont think monit is the problem but we have too many services which are not working properly!

@volodymyrsamotiy please upload techsupport as well. have it with 1d long to show the problem

abdosi commented 4 years ago

For Telemetry we need to deploy telemetry certificate in /etc/sonic/telemetry. Following steps should do it testbed-cli.sh deploy-mg from sonic-mgmt repo