Open duckfez opened 2 years ago
@duckfez thanks for the report! Definitely looks interesting. One thing I'm curious about - have you tried this same test without using gitfs? (maybe add way more files to the /srv/bigjunk
or something)
I know that gitfs can have some... "surprising" side-effects. I wonder if this behavior is exacerbated by gitfs, or actually caused by it :thinking:
@waynew yeah, I tried it both with gitfs+roots and just roots. And I think (not being an expert) that the race is there either way. The logs in my second test attempt above are with roots only (gitfs disabled).
My suspicion is that gitfs does exacerbate it just because it takes gitfs longer / more CPU cycles to produce the file_list cache. But, with roots alone, the underlying problem is still there. Like you said, I just had to add a BUNCH of files to /srv/bigjunk
in order to get roots looking at enough files to really see the problem. Also, roots can take (more) advantage of the kernel filesystem cache than gitfs, maybe?
I've thought about this (without looking at the code) and have been considering taking the write lock just before checking to see if the cache needs to be refreshed. Then, release it once the refresh is done. What I think is happening right now is all of the master workers thunder off to in parallel to see "is the cache too old? If so, I should lock it and rebuild". Only one gets to the lock, but they've all committed by that point to seeing the rebuild through ... so each one in sequence locks it and rebuilds it.
@waynew I did some experimenting here, and have made things both "better" and "worse" by adjusting the locking code. I found two "issues":
I have a "fix" for both of these, but then end result is several workers all block waiting for the rebuild being done by the one. Master CPU utilization is better, minion throughput is a (little) better but the master still becomes unresponsive because all of the workers get tied up.
I have ideas for a different implementation, but it's a much bigger change and I would want to consult with you all more closely before just running off on some tangent.
@duckfez I am actually running into the same issue on my end for 3006.1 as well, was there any solution to prevent the constant cache rebuilds?
Hi @Caine142 I don't know of one. I changed jobs earlier this year and in the new job we are not using saltstack at this time, so it's sorta fallen off my radar. One thing that we did do that helps a lot was using the --batch
option to salt. Effectively, we'd batch for a number close to the number of cores in the salt master. That way, you never overloaded the salt master. The CPU usage would still go to 100%, but it would not cause a lot of job queueing. This might be a possible workaround for you?
I had plans to write a SEP to substantially change the file list cache functionality in a couple of ways:
Both of these are large enough changes that they needed to go through a SEP, but I've not actually written it.
Description
Executing a state on minion(s) will cause the file list cache on the master to be rebuilt an indeterminate number of times. There is high CPU utilization on the master because of the repeated rebuilds. The amount of time for a state run to complete is a function of the number of minions running the state, how many files are in the fileserver, and the number of times the master regenerates the filename cache.
Setup
Environment is in AWS EC2, t3.large for master, t3.small for 6 minions. There are no pillars to minimize pillar rendering from being a consideration in the performance.
I have tested two different configurations, one with gitfs+roots backends and one with just the roots backend.
In order to make the file_list rebuild process take a somewhat "long" time (seconds), I have a test git repo filled with about 55K files, which are just multiple copies of the salt source tree. The same 55K files are also copied into
/srv/bigjunk
on the master.master config w/ gitfs:
master config w/ roots only:
My dummy minimal test state, teststate.sls is:
Steps to Reproduce the behavior
I'm tailing the master logs with this grep to capture specific messages relevant to this:
Start with a list of minions:
Run a state against just one, and measure wall time of it:
Logs from this:
logs:
Notice how we rebuilt the file_list cache 5 times (out of 6), 4 of the 5 kicked off basically concurrently. Where running the state on 1 minion finished in 18 seconds, doing it on 6 took 47.
Now if I run 2 back-to-back in quick succession I get a slightly different result. The cache from the prior run is still valid, and none of the worker threads attempt to rebuild it, so the state runs on all 6 minions in about 6 seconds.
The logs:
Expected behavior
The master should regenerate the file_list cache for a given backend only once per
fileserver_cache_list_time
.Screenshots
N/A
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) Master is running the master branch from git as of commit 306aa6dd29 with some local changes to add in additional debug logging for my help. Minions are running 3004 from the RPM repo. ``` Salt Version: Salt: 3003rc1+1373.g306aa6dd29 Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: Not Installed docker-py: Not Installed gitdb: 4.0.9 gitpython: 3.1.20 Jinja2: 3.0.3 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.3 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: 2.6.1 pycryptodome: 3.12.0 pygit2: Not Installed Python: 3.6.8 (default, Sep 9 2021, 07:49:02) python-gnupg: Not Installed PyYAML: 6.0 PyZMQ: 22.3.0 smmap: 5.0.0 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: rhel 8.5 Ootpa locale: UTF-8 machine: x86_64 release: 4.18.0-305.el8.x86_64 system: Linux version: Red Hat Enterprise Linux 8.5 Ootpa ```Additional context Add any other context about the problem here.