sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
174 stars 695 forks source link

test_snmp_memory_load is flaky on some of the platforms #8935

Open SuvarnaMeenakshi opened 1 year ago

SuvarnaMeenakshi commented 1 year ago

Description

Steps to reproduce the issue:

  1. run snmp/test_snmp_memory.py::test_snmp_memory_load

Describe the results you received: Fails with error Failed: sysTotalFreeMemory differs by more than 4 or Failed: sysTotalFreeMemory differs by more than 8 on platforms with ~4G memory.

Issues:

  1. nohup python /tmp/memory.py > /dev/null 2>&1 & is executed on the DUT before the test begins. When this command is run on device < 4G of total memory, the device often reboots.

  2. nohup python /tmp/memory.py > /dev/null 2>&1 & - This command execution finishes very quickly on DUTs with >=8 G total memory.

Observation: The FreeMemory obtained by OID: 1.3.6.1.4.1.2021.4.11.0 is obtained from snmpd computation. Seems like the FreeMemory coming from inside SNMP docker is not exactly the same as the host FreeMemory, though the /proc/meminfo is not containerized. Captured the Free Memory and Available memory from host/docker/snmp using the script below:

#!/bin/bash
x=100
while [[ $x > 0 ]]; do
  host_memfree=`cat /proc/meminfo | grep MemFree | awk '{ print $2 }'`
  docker_memfree=`docker exec -it snmp cat /proc/meminfo | grep MemFree | awk '{ print $2 }'`
  snmp_memfree=`docker exec -it snmp snmpget -v 2c -c msft 127.0.0.1 1.3.6.1.4.1.2021.4.11.0 | awk '{ print $4 }'`
  host_avail=`cat /proc/meminfo | grep MemAvailable | awk '{ print $2 }'`
  docker_avail=`docker exec -it snmp cat /proc/meminfo | grep MemAvailable | awk '{ print $2 }'`
  snmp_usage=`docker exec -it snmp snmpget -v 2c -c msft 127.0.0.1 1.3.6.1.4.1.6027.3.10.1.2.9.1.5.1 | awk '{ print $4 }'`
  echo "$host_avail $docker_avail $host_memfree $docker_memfree $snmp_memfree"
  echo "$snmp_usage"
  x=$((x-1))
done

Available memory = ((100-snmp_usage)*total_memory)/100
image

Observing that the Available memory computed based on memory usage OID "1.3.6.1.4.1.6027.3.10.1.2.9.1.5.1" is closer to the Available memory of the host. Loading of memory is also done on the host.

Question:

  1. OID 1.3.6.1.4.1.2021.4.11.0 could be showing FreeMemory of SNMP docker and not the host, so loading memory on the host and comparing with FreeMemory on host can be incorrect way of testing?

Describe the results you expected:

Additional information you deem important:

**Output of `show version`:**

```
(paste your output here)
```

**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
bingwang-ms commented 12 months ago

The issue should be addressed by PR https://github.com/sonic-net/sonic-mgmt/pull/9074