Open terminalmage opened 5 months ago
@terminalmage There were a few fixes around open sockets in the last few releases. Can you test this against 3006.8
or 3006.9
?
3006.9
should be released tomorrow.
we are on 3006.8 (upgraded from 3004) experience same problem, salt-syndic process leaks memory and at that time it is unresponsive from MoM, in syndics logs each time I am trying to kick something from MoM see this warning
2024-08-06 11:23:24,491 [salt.minion :2383][WARNING ][5125] The minion failed to return the job information for job 20240806153202858354. This is often due to the master being shut down or overloaded. If the master is running, consider increasing the worker_threads value.
happens randomly could be one syndic per week could be 8
I wonder if upgrading msgpack to 1.1.0 would fix this or not: https://github.com/msgpack/msgpack-python/issues/283
Description
Over time, the syndic process grows in memory until all memory is exhausted and the syndic gets OOM-killed.
This syndic processes a lot of events, forwarding anywhere from 150-600 at a time (discovered through adding some debug logging to
salt.minion.SyndicManager._forward_events()
).I have used a few profilers to discover where memory is being allocated, and they all point to msgpack. Here's the relevant part of the summary from running the syndic using scalene:
Setup
There is nothing special about the syndic configuration. The
syndic_master
is configured with the IP of the master-of-masters, andorder_masters
is set toTrue
. The master running on the syndic is using the default number of worker threads (5).The box is running under KVM virtualization, with Salt installed via pip.
Versions Report