Closed denschub closed 1 year ago
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!
Might be related to #58791, but that one is so old, I felt like it's better to file a new one.
More likely related to #62706.
I have not been able to replicate this. What settings are setup in the master?
here is a list of items in the maintenance process. that might help with knowing which settings to share.
This is done on loop_interval.
So, there's a couple of things. I'm using ext_pillar
, where one is a local git repo, and another one is actually a cmd_json
. I also have three gitfs_remotes
. It's ahrd to disable any of them because that effectively breaks everything -- but if you want me to disable something to check if the memory issue still happens, I can make that happen.
Thank you. I"ll see what i can do with these. the only one that should matter with the maintenance processes is the git_pillar. all the others are handled through other processes.
speaking of which. how large is the local repo for your git_pillar?
for clarity, how many minions are connected to the master? in a breakdown of acceptance would be best.
humm. one thing i am noticing. you are not using ssh for any of your gits so you might be able to change library without much issue.
I noticed you setup pubkey and privkey even though your not using ssh with gitfs or git_pillar. you can remove those settings as those are only for ssh based git. and ignored otherwise.
since you are not actually using the more advanced git authentication methods currently. can you try switching the git library you use. you might need to install the other library for this. and set the config to use it. if your using pygit2 switch to GitPython or vise versa.
if the problem remains it most likely would be something in git_pillar. if it doesn't it is the library in use. either way please update us.
Thanks for your help so far. Much appreciated!
how large is the local repo for your git_pillar?
A clone of the entire repo is ~550KiB. 18 files, 530'ish lines.
how many minions are connected to the master? in a breakdown of acceptance would be best.
18 minions, all accepted. No denied/unaccepted/rejected.
I noticed you setup pubkey and privkey even though your not using ssh with gitfs or git_pillar.
That is true. I used to use ssh for gitfs, but had an issue with that a while ago, and switched to just pointing it to the local directory. However, that issue is no longer valid as far as I know. I have just pointed the gitfs back to ssh://, and will report if that resolves the issue.
If it doesn't, I'll switch from pygit2 to GitPython and report back the results of that!
humm. defiantly shouldn't be using that much memory. and to test i setup a local git server to run a small pillar though and then set the git_pillar update interval to 2 seconds. and i don't seem to be having any kind of increase in mem. :/ that worries me more. as that might mean it is something else. are you using any jinja in those pillar? doing any file importing?
No Jinja, no file importing. It's all just pretty boring YAML. :/ Looking at my memory graph just now, it's clear that switching back to ssh
didn't work:
I flipped back my git_pillar to file://
and switched to gitpython
. Will report back in 12 hours or so!
Okay, I'm flabbergasted. I switched to gitpython
yesterday, and memory usage has been stable. To verify, I switched back to pygit2
at 16:00 UTC, and sure enough, it's immediately back at eating memory:
So the issue is either in libgit2
, pygit2
, or in the Salt code calling it. :/ There currently is an update to libgit2 1.6.3
in Archlinux' testing, which I'll update to as soon as I can. At this moment, however, libgit2 1.5.1
and pygit2 1.11.1
are the versions this reproduces on, and the newest I have available.
interesting. my own testing system is running salt 3006rc2 with libgit 1.5.0 and pygit2 1.11.1 and I can't replicate it.
looking at the changelog for libgit2 there were a couple of mem leak fixes in 1.6.1.
but that makes me wonder how am i not seeing it. unless the mem leak was introduced in 1.5.1
My initial bug report was filed with libgit2 1.5.0, so it's not a 1.5.1 regression I'm afraid. Maybe 3006 fixed it by accident? That's unlikely, but heh. Given that I can work around this just fine for myself by switching to gitpython
, I'm happy to wait until that's the 1.6.1 libgit2 update is available for me to test. I'd love to be able to provide more useful information, but I'm sadly not enough into Python to know an approach to get useful memory traces...
:/ humm I don't know then. I doubt 3006 changed anything that would fix it. the git_pillar code hasn't changed in almost 2 years and the only change to the utils.gitfs code which git_pillar uses was about version information. everything else is much older than the 2 years. there has to be another variable in play that we are overlooking.
The fact it is happening in the maintenance thread means it is localized to the git fetch.
@denschub I believe #64072 should fix your leak. If you find that you still see the leak please reopen the ticket. Thank you for your patience.
Description This appears to be a regression in 3005.1, as I didn't see this before. The "maintenance" process grows relatively quickly, until it eventually gets OOM'ed.
Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior Nothing special. It's just a master running. Nothing of note in the logs either.
Expected behavior n/a
Screenshots
(green is memory in use, yellow is disk caches, rest is.. rest.)
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml Salt Version: Salt: 3005.1 Dependency Versions: cffi: 1.15.1 cherrypy: Not Installed dateutil: Not Installed docker-py: Not Installed gitdb: 4.0.10 gitpython: 3.1.29 Jinja2: 3.1.2 libgit2: 1.5.0 M2Crypto: 0.38.0 Mako: Not Installed msgpack: 1.0.4 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.12.0 pygit2: 1.11.1 Python: 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0] python-gnupg: Not Installed PyYAML: 6.0 PyZMQ: 24.0.1 smmap: 5.0.0 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: arch locale: utf-8 machine: x86_64 release: 6.1.12-1-lts system: Linux version: Arch Linux ```Additional context This appears to be a regression in 3005.1, but I can't 100% verify this right now :/