sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
734 stars 1.41k forks source link

monit errors in the logs "ERR memory_checker: [memory_checker] Failed to execute the command" #10088

Closed nazariig closed 2 years ago

nazariig commented 2 years ago

Description

The issue is caused by a missing telemetry container. The situation may take place on a system start when docker container is not created yet. The root cause is monit async service health state check.

Steps to reproduce the issue:

root@r-boxer-sw01:/home/admin# systemctl stop telemetry
root@r-boxer-sw01:/home/admin# docker rm -f telemetry

root@r-boxer-sw01:/home/admin# monit restart container_memory_telemetry
root@r-boxer-sw01:/home/admin# monit status container_memory_telemetry
Monit 5.20.0 uptime: 2h 18m

Program 'container_memory_telemetry'
  status                       Status ok
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  last exit value              1
  last output                  -
  data collected               Mon, 14 Feb 2022 14:44:18

root@r-boxer-sw01:/home/admin# docker ps
CONTAINER ID   IMAGE                                COMMAND                  CREATED       STATUS       PORTS     NAMES
3f3091a296bf   736fd83ca6a1                         "/usr/local/bin/supe…"   3 hours ago   Up 2 hours             what-just-happened
9260683a3e6c   docker-snmp:latest                   "/usr/local/bin/supe…"   3 hours ago   Up 2 hours             snmp
50bc3a6a8351   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   3 hours ago   Up 2 hours             mgmt-framework
61dda613653e   7c4f66877495                         "/usr/bin/docker_ini…"   3 hours ago   Up 2 hours             dhcp_relay
02f30d3ab380   docker-router-advertiser:latest      "/usr/bin/docker-ini…"   3 hours ago   Up 2 hours             radv
0331e1717cf6   docker-lldp:latest                   "/usr/bin/docker-lld…"   3 hours ago   Up 2 hours             lldp
adb75e6f35b9   docker-platform-monitor:latest       "/usr/bin/docker_ini…"   3 hours ago   Up 2 hours             pmon
3b29843ee089   docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   3 hours ago   Up 2 hours             syncd
361bceba54e7   docker-teamd:latest                  "/usr/local/bin/supe…"   3 hours ago   Up 2 hours             teamd
2bd90fc2b1a4   docker-orchagent:latest              "/usr/bin/docker-ini…"   3 hours ago   Up 2 hours             swss
7ced7c469af0   docker-fpm-frr:latest                "/usr/bin/docker_ini…"   3 hours ago   Up 2 hours             bgp
b43c5781c33b   docker-database:latest               "/usr/local/bin/dock…"   3 hours ago   Up 3 hours             database

root@r-boxer-sw01:/home/admin# docker stats --no-stream --format {{.MemUsage}} telemetry
Error response from daemon: No such container: telemetry
root@r-boxer-sw01:/home/admin# echo $?
1

Describe the results you received:

root@r-boxer-sw01:/home/admin# tail -F /var/log/syslog | grep memory
Feb 14 14:12:40.904682 r-boxer-sw01 ERR memory_checker: [memory_checker] Failed to execute the command 'docker stats --no-stream --format \{\{.MemUsage\}\} telemetry'. Return code: '1'

Describe the results you expected:

No error messages are expected when docker doesn't exist

Output of show version:

root@r-boxer-sw01:/home/admin# show version

SONiC Software Version: SONiC.202111.10-f08866b66_Internal
Distribution: Debian 11.2
Kernel: 5.10.0-8-2-amd64
Build commit: f08866b66
Build date: Mon Feb  7 08:15:17 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic-ci02-241

Platform: x86_64-mlnx_msn2010-r0
HwSKU: ACS-MSN2010
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1749X10061
Model Number: MSN2010-CB2F
Hardware Revision: A1
Uptime: 15:52:04 up 12 min,  1 user,  load average: 1.02, 0.84, 0.62

Docker images:
REPOSITORY                                         TAG                            IMAGE ID       SIZE
docker-teamd                                       202111.10-f08866b66_Internal   a6461b4fc1b1   438MB
docker-teamd                                       latest                         a6461b4fc1b1   438MB
docker-sflow                                       202111.10-f08866b66_Internal   502f7aeb5296   439MB
docker-sflow                                       latest                         502f7aeb5296   439MB
docker-orchagent                                   202111.10-f08866b66_Internal   22c1035163ec   457MB
docker-orchagent                                   latest                         22c1035163ec   457MB
docker-nat                                         202111.10-f08866b66_Internal   bfaaeef59e80   441MB
docker-nat                                         latest                         bfaaeef59e80   441MB
docker-macsec                                      202111.10-f08866b66_Internal   19ce40820a13   441MB
docker-macsec                                      latest                         19ce40820a13   441MB
docker-fpm-frr                                     202111.10-f08866b66_Internal   f76fdb1c1625   457MB
docker-fpm-frr                                     latest                         f76fdb1c1625   457MB
docker-syncd-mlnx                                  202111.10-f08866b66_Internal   9187d53ee421   1.01GB
docker-syncd-mlnx                                  latest                         9187d53ee421   1.01GB
docker-platform-monitor                            202111.10-f08866b66_Internal   b9bce6dd4fad   809MB
docker-platform-monitor                            latest                         b9bce6dd4fad   809MB
docker-snmp                                        202111.10-f08866b66_Internal   8a604e09da49   465MB
docker-snmp                                        latest                         8a604e09da49   465MB
docker-dhcp-relay                                  latest                         7c4f66877495   436MB
docker-sonic-mgmt-framework                        202111.10-f08866b66_Internal   8e4823d8d271   578MB
docker-sonic-mgmt-framework                        latest                         8e4823d8d271   578MB
docker-sonic-telemetry                             202111.10-f08866b66_Internal   df6917e0f648   511MB
docker-sonic-telemetry                             latest                         df6917e0f648   511MB
docker-router-advertiser                           202111.10-f08866b66_Internal   ab565ec647e8   423MB
docker-router-advertiser                           latest                         ab565ec647e8   423MB
docker-mux                                         202111.10-f08866b66_Internal   4132719ce52e   475MB
docker-mux                                         latest                         4132719ce52e   475MB
docker-lldp                                        202111.10-f08866b66_Internal   5b758f8ab7b9   463MB
docker-lldp                                        latest                         5b758f8ab7b9   463MB
docker-database                                    202111.10-f08866b66_Internal   3d69079016cc   423MB
docker-database                                    latest                         3d69079016cc   423MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Monit summary:

root@r-boxer-sw01:/home/admin# monit summary
Monit 5.20.0 uptime: 2h 15m

 Service Name                     Status                      Type          
 r-boxer-sw01                     Running                     System        
 rsyslog                          Running                     Process       
 root-overlay                     Accessible                  Filesystem    
 var-log                          Accessible                  Filesystem    
 routeCheck                       Status ok                   Program       
 diskCheck                        Status ok                   Program       
 container_checker                Status ok                   Program       
 vnetRouteCheck                   Status ok                   Program       
 container_memory_telemetry       Status ok                   Program       

Monit configuration:

root@r-boxer-sw01:/home/admin# cat /etc/monit/conf.d/monit_telemetry
###############################################################################
## Monit configuration for telemetry container
###############################################################################
check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400"
    if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry"

Monit memory checker:

root@r-boxer-sw01:/home/admin# cat /usr/bin/memory_checker
def get_command_result(command):
    """Executes the command and return the resulting output.

    Args:
        command: A string contains the command to be executed.

    Returns:
        A string which contains the output of command.
    """
    command_stdout = ""

    try:
        proc_instance = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
                                         shell=True, universal_newlines=True)
        command_stdout, command_stderr = proc_instance.communicate()
        if proc_instance.returncode != 0:
            syslog.syslog(syslog.LOG_ERR, "[memory_checker] Failed to execute the command '{}'. Return code: '{}'"
                          .format(command, proc_instance.returncode))
            sys.exit(1)
    except (OSError, ValueError) as err:
        syslog.syslog(syslog.LOG_ERR, "[memory_checker] Failed to execute the command '{}'. Error: '{}'"
                      .format(command, err))
        sys.exit(2)

    return command_stdout.strip()
qiluo-msft commented 2 years ago

What is the severity of this issue? Does it have any impact other than syslog ERR message?