saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.09k stars 5.47k forks source link

[BUG] salt-syndic memory leak #66386

Open terminalmage opened 5 months ago

terminalmage commented 5 months ago

Description

Over time, the syndic process grows in memory until all memory is exhausted and the syndic gets OOM-killed.

This syndic processes a lot of events, forwarding anywhere from 150-600 at a time (discovered through adding some debug logging to salt.minion.SyndicManager._forward_events()).

I have used a few profilers to discover where memory is being allocated, and they all point to msgpack. Here's the relevant part of the summary from running the syndic using scalene:

                           Memory usage: ▁▂▂▃▃▃▃▃▃▄▄▄▅▅▅▆▆▆▆▇▇██████ (max: 15.046 GB, growth rate:  28%)

... snip...

           /usr/local/lib/python3.9/dist-packages/msgpack/__init__.py: % of time =   0.36% (8.694s) out of 41m:41.056s.
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷
       │Time   │–––––– │–––––– │Memory  │–––––– │–––––––––––    │Copy   │
  Line │Python │native │system │Python  │peak   │timeline/%     │(MB/s) │/usr/local/lib/python3.9/dist-packages/msgpack/__init__…
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸

... snip ...

    30 │       │       │       │        │       │               │       │def packb(o, **kwargs):
    31 │       │       │       │        │       │               │       │    """
    32 │       │       │       │        │       │               │       │    Pack object `o` and return packed bytes
    33 │       │       │       │        │       │               │       │
    34 │       │       │       │        │       │               │       │    See :class:`Packer` for options.
    35 │       │       │       │        │       │               │       │    """
    36 │       │       │       │ 100%   │11.85G │▁▁▂▂▂▃▃▃▃  37% │     9 │    return Packer(**kwargs).pack(o)
    37 │       │       │       │        │       │               │       │

... snip ...

╶──────┼───────┼───────┼───────┼────────┼───────┼───────────────┼───────┼─────────────────────────────────────────────────────────╴
       │       │       │       │        │       │               │       │function summary for /usr/local/lib/python3.9/dist-pack…
    30 │       │       │       │ 100%   │11.85G │█████████  37% │     9 │AsyncReqMessageClient.packb
       ╵       ╵       ╵       ╵        ╵       ╵               ╵       ╵

Setup

There is nothing special about the syndic configuration. The syndic_master is configured with the IP of the master-of-masters, and order_masters is set to True. The master running on the syndic is using the default number of worker threads (5).

The box is running under KVM virtualization, with Salt installed via pip.

Versions Report

          Salt: 3006.5

Python Version:
        Python: 3.9.2 (default, Feb 28 2021, 17:03:44)

Dependency Versions:
          cffi: 1.16.0
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.3
       libgit2: Not Installed
  looseversion: 1.3.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.0
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 24.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.20.0
        pygit2: Not Installed
  python-gnupg: Not Installed
        PyYAML: 5.3.1
         PyZMQ: 25.1.2
        relenv: Not Installed
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: debian 11 bullseye
        locale: utf-8
       machine: x86_64
       release: 5.10.0-28-amd64
        system: Linux
       version: Debian GNU/Linux 11 bullseye
dwoz commented 1 month ago

@terminalmage There were a few fixes around open sockets in the last few releases. Can you test this against 3006.8 or 3006.9?

3006.9 should be released tomorrow.

romanbakaleyko commented 1 month ago

we are on 3006.8 (upgraded from 3004) experience same problem, salt-syndic process leaks memory and at that time it is unresponsive from MoM, in syndics logs each time I am trying to kick something from MoM see this warning

2024-08-06 11:23:24,491 [salt.minion :2383][WARNING ][5125] The minion failed to return the job information for job 20240806153202858354. This is often due to the master being shut down or overloaded. If the master is running, consider increasing the worker_threads value.

happens randomly could be one syndic per week could be 8

dwoz commented 1 month ago

I wonder if upgrading msgpack to 1.1.0 would fix this or not: https://github.com/msgpack/msgpack-python/issues/283