Closed rgeoghegan closed 1 year ago
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!
@Ch3LL Hi! I was on the salt community call last week, and I promised to file the bug I was trying to describe.
What I could also do is submit a patch which just wraps the msgpack reading thing with a try:...except:
and treat any relevant msgpack, file-not-found, etc exception the same as 'the file is missing', which should cause the proper cache to be rebuilt and saved properly.
Looks like I'm able to replicate this. If you submit a PR, I will be more than willing to review and test it. I haven't gone into the code yet, but I will when you submit the PR and make sure its the correct fix.
FYI it took a while because I was out on vacation, but I just put up the PR.
@rgeoghegan Do we know the reason this file is getting corrupted in the first place?
@dwoz Nothing is specifically corrupting the file, but I was playing with clearing the pillar cache file by just deleting it, and noticed a race condition in the code (along with this bug), and saw that if the file is corrupted, there is no way to recover other than manually deleting the disk cache file.
Description If I use the
pillar_cache_backend: "disk"
config option, and the on-disk msgpack file for a minion gets corrupted, the pillar is now blank, and any attempt to runpillar.clear_pillar_cache
crashes, even after restarting the salt-master.Setup
I am using salt 3004.1 from the yum repo:
I setup a system with a master and a minion with one pillar file:
my_pillar.sls
Steps to Reproduce the behavior
I start with my pillar working as expected:
I add stuff to the pillar file to make it an invalid msgpack file:
Now my pillar is reported as empty:
And the master log has an exception:
Running clear_pillar_cache does not work:
And all this behaviour persists even if the salt-master is restarted.
If I delete the cache file, everything returns to normal:
Expected behavior
IMHO, an unreadable cache file should be treated as a missing cache, and just cause the pillar to be rebuilt.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.) ```yaml [root@saltmaster /]# salt --versions-report Salt Version: Salt: 3004.2 Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: Not Installed docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 2.11.1 libgit2: Not Installed M2Crypto: 0.35.2 Mako: Not Installed msgpack: 0.6.2 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: Not Installed pygit2: Not Installed Python: 3.6.8 (default, Nov 16 2020, 16:55:22) python-gnupg: Not Installed PyYAML: 3.13 PyZMQ: 17.0.0 smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.1.4 System Versions: dist: centos 7 Core locale: UTF-8 machine: x86_64 release: 5.10.104-linuxkit system: Linux version: CentOS Linux 7 Core ```