saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.11k stars 5.47k forks source link

[BUG] Salt Proxy 3003.2 memory leak issue #60923

Open TheBirdsNest opened 3 years ago

TheBirdsNest commented 3 years ago

Description Related to closed bug #58813 Despite running the latest version of Salt, a memory leak still persists in the Salt Proxy component. As I need to run multiple salt proxy processes (6500), this memory issue is exacerbated and quite severe.

Setup

multiprocessing: False

log_level: quiet

DNS

retry_dns: 60 retry_dns_count: 5

Internal Timers

acceptance_wait_time: 10 random_reauth_delay: 60 auth_timeout: 60 random_startup_delay: 30

Grains Configuration

grains_cache: False grains_deep_merge: True

grains_cache_expiration: 5400

grains_refresh_every: 60

grains_blacklist:

enable_gpu_grains: False

Mine Configuration

mine_enabled: False

mine_return_job: False

mine_interval: 60


**Steps to Reproduce the behavior**
start 1..n proxy minions

**Expected behavior**
The proxy does not endlessly consume resources, instead runs with a stable amount. I believe something like 500Mb is considered the norm.

**Screenshots**

![Salt-1](https://user-images.githubusercontent.com/31070227/134339887-94830593-8268-4191-8551-52fa9cab7a43.png)
![image](https://user-images.githubusercontent.com/31070227/134641637-e8a763f1-3ce3-47e6-9a8b-7c47fcb3c320.png)
![Salt-2](https://user-images.githubusercontent.com/31070227/134339913-d42fcee1-442a-40a0-a235-16c1e4502c88.png)

**Versions Report**

Salt Version: Salt: 3003.2

Dependency Versions: cffi: 1.14.5 cherrypy: Not Installed dateutil: 2.8.1 docker-py: 4.4.4 gitdb: Not Installed gitpython: Not Installed Jinja2: 2.11.3 libgit2: Not Installed M2Crypto: 0.35.2 Mako: Not Installed msgpack: 0.6.2 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: 2.20 pycrypto: Not Installed pycryptodome: 3.10.1 pygit2: Not Installed Python: 3.6.8 (default, Nov 16 2020, 16:55:22) python-gnupg: Not Installed PyYAML: 5.4.1 PyZMQ: 17.0.0 smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.1.4

System Versions: dist: centos 7 Core locale: UTF-8 machine: x86_64 release: 3.10.0-1160.36.2.el7.x86_64 system: Linux version: CentOS Linux 7 Core

TheBirdsNest commented 3 years ago

PMap shows increasing memory usage for a single proxy process:

[lbird@gcp3476prdamn01 ~]$ date
Wed Sep 22 15:40:48 UTC 2021
[lbird@gcp3476prdamn01 ~]$ sudo pmap 15913 | tail -n 1
 total          1171640K
[lbird@gcp3476prdamn01 ~]$ date
Wed Sep 22 15:41:33 UTC 2021
[lbird@gcp3476prdamn01 ~]$ sudo pmap 15913 | tail -n 1
 total          1177864K
[lbird@gcp3476prdamn01 ~]$ date
Wed Sep 22 15:43:24 UTC 2021
[lbird@gcp3476prdamn01 ~]$ sudo pmap 15913 | tail -n 1
 total          1200652K
[lbird@gcp3476prdamn01 ~]$ date
Wed Sep 22 15:45:03 UTC 2021
[lbird@gcp3476prdamn01 ~]$ sudo pmap 15913 | tail -n 1
 total          1221932K
[lbird@gcp3476prdamn01 ~]$ 

Looking at PMap, I'm not what part is causing the leak:

image

TheBirdsNest commented 3 years ago

Note the above is from an example where a proxy is constantly restarting because it cannot connect. On each restart it consumes more memory.

Similarly, when I run a command against the proxy, memory consumption jumps but its never released:

[lbird@gcp3476prdamn01 ~]$ sudo pmap 20960 | tail -n 1
 total          1523008K
[lbird@gcp3476prdamn01 ~]$ sudo salt 'c40988df-1a86-4ff0-bf8d-e018cc6c55bd' net.facts
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    ----------
    [REDACTED]

-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
[lbird@gcp3476prdamn01 ~]$ sudo pmap 20960 | tail -n 1
 total          1545536K
[lbird@gcp3476prdamn01 ~]$ sudo pmap 20960 | tail -n 1
 total          1545536K
[lbird@gcp3476prdamn01 ~]$ sudo pmap 20960 | tail -n 1
 total          1545536K
[lbird@gcp3476prdamn01 ~]$ sudo pmap 20960 | tail -n 1
 total          1545536K

image

TheBirdsNest commented 2 years ago

@twangboy have you had an opportunity to review this one yet? :)