The issue is caused by a missing telemetry container.
The situation may take place on a system start when docker container is not created yet.
The root cause is monit async service health state check.
Steps to reproduce the issue:
root@r-boxer-sw01:/home/admin# systemctl stop telemetry
root@r-boxer-sw01:/home/admin# docker rm -f telemetry
root@r-boxer-sw01:/home/admin# monit restart container_memory_telemetry
root@r-boxer-sw01:/home/admin# monit status container_memory_telemetry
Monit 5.20.0 uptime: 2h 18m
Program 'container_memory_telemetry'
status Status ok
monitoring status Monitored
monitoring mode active
on reboot start
last exit value 1
last output -
data collected Mon, 14 Feb 2022 14:44:18
root@r-boxer-sw01:/home/admin# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3f3091a296bf 736fd83ca6a1 "/usr/local/bin/supe…" 3 hours ago Up 2 hours what-just-happened
9260683a3e6c docker-snmp:latest "/usr/local/bin/supe…" 3 hours ago Up 2 hours snmp
50bc3a6a8351 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 3 hours ago Up 2 hours mgmt-framework
61dda613653e 7c4f66877495 "/usr/bin/docker_ini…" 3 hours ago Up 2 hours dhcp_relay
02f30d3ab380 docker-router-advertiser:latest "/usr/bin/docker-ini…" 3 hours ago Up 2 hours radv
0331e1717cf6 docker-lldp:latest "/usr/bin/docker-lld…" 3 hours ago Up 2 hours lldp
adb75e6f35b9 docker-platform-monitor:latest "/usr/bin/docker_ini…" 3 hours ago Up 2 hours pmon
3b29843ee089 docker-syncd-mlnx:latest "/usr/local/bin/supe…" 3 hours ago Up 2 hours syncd
361bceba54e7 docker-teamd:latest "/usr/local/bin/supe…" 3 hours ago Up 2 hours teamd
2bd90fc2b1a4 docker-orchagent:latest "/usr/bin/docker-ini…" 3 hours ago Up 2 hours swss
7ced7c469af0 docker-fpm-frr:latest "/usr/bin/docker_ini…" 3 hours ago Up 2 hours bgp
b43c5781c33b docker-database:latest "/usr/local/bin/dock…" 3 hours ago Up 3 hours database
root@r-boxer-sw01:/home/admin# docker stats --no-stream --format {{.MemUsage}} telemetry
Error response from daemon: No such container: telemetry
root@r-boxer-sw01:/home/admin# echo $?
1
Describe the results you received:
root@r-boxer-sw01:/home/admin# tail -F /var/log/syslog | grep memory
Feb 14 14:12:40.904682 r-boxer-sw01 ERR memory_checker: [memory_checker] Failed to execute the command 'docker stats --no-stream --format \{\{.MemUsage\}\} telemetry'. Return code: '1'
Describe the results you expected:
No error messages are expected when docker doesn't exist
(paste your output here or download and attach the file here )
Additional information you deem important (e.g. issue happens only occasionally):
Monit summary:
root@r-boxer-sw01:/home/admin# monit summary
Monit 5.20.0 uptime: 2h 15m
Service Name Status Type
r-boxer-sw01 Running System
rsyslog Running Process
root-overlay Accessible Filesystem
var-log Accessible Filesystem
routeCheck Status ok Program
diskCheck Status ok Program
container_checker Status ok Program
vnetRouteCheck Status ok Program
container_memory_telemetry Status ok Program
Monit configuration:
root@r-boxer-sw01:/home/admin# cat /etc/monit/conf.d/monit_telemetry
###############################################################################
## Monit configuration for telemetry container
###############################################################################
check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400"
if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry"
Monit memory checker:
root@r-boxer-sw01:/home/admin# cat /usr/bin/memory_checker
def get_command_result(command):
"""Executes the command and return the resulting output.
Args:
command: A string contains the command to be executed.
Returns:
A string which contains the output of command.
"""
command_stdout = ""
try:
proc_instance = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
shell=True, universal_newlines=True)
command_stdout, command_stderr = proc_instance.communicate()
if proc_instance.returncode != 0:
syslog.syslog(syslog.LOG_ERR, "[memory_checker] Failed to execute the command '{}'. Return code: '{}'"
.format(command, proc_instance.returncode))
sys.exit(1)
except (OSError, ValueError) as err:
syslog.syslog(syslog.LOG_ERR, "[memory_checker] Failed to execute the command '{}'. Error: '{}'"
.format(command, err))
sys.exit(2)
return command_stdout.strip()
Description
The issue is caused by a missing
telemetry
container. The situation may take place on a system start whendocker
container is not created yet. The root cause ismonit
async service health state check.Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
No error messages are expected when docker doesn't exist
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
Monit summary:
Monit configuration:
Monit memory checker: