Open abarclay opened 2 years ago
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!
Description We have a Hashicorp enterprise vault cluster which is front-ended by haproxy. Write requests (including token renewals) are always serviced by the Active Node in the cluster. Read requests can be serviced by any node in the cluster. Occasionally (~20% of the time), salt gets a token then attempts to use the token BEFORE the token has been replicated to the rest of the , it will return the following error: 2022-07-31 00:00:20,518 [salt.loaded.int.utils.vault:346 ][ERROR ][1128250] Error from vault: {"errors":["error performing token check: no lease entry found for token that ought to have one, possible eventual consistency issue"]}
Setup Salt is configured as follows: vault: url: https://zvault.zinternal.com verify: /etc/ssl/certs/ca-certificates.crt auth: method: approle uses: 10000 ttl: 86400 role_id: REDACT secret_id: REDACT policies:
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior Here is the test:
!/bin/bash
while true do echo "$(date): $(salt-call pillar.get saltvault | wc -l)" sleep 1 done
Expected behavior The loop should always return the same results - the same pillars with the same values.
Actual output Fri 05 Aug 2022 05:32:42 PM UTC: 941 Fri 05 Aug 2022 05:32:49 PM UTC: 941 Fri 05 Aug 2022 05:32:55 PM UTC: 941 Fri 05 Aug 2022 05:33:06 PM UTC: 941 Fri 05 Aug 2022 05:33:13 PM UTC: 941 Fri 05 Aug 2022 05:33:20 PM UTC: 941 Fri 05 Aug 2022 05:33:27 PM UTC: 2 Fri 05 Aug 2022 05:33:39 PM UTC: 941 Fri 05 Aug 2022 05:33:46 PM UTC: 941 Fri 05 Aug 2022 05:33:52 PM UTC: 941 Fri 05 Aug 2022 05:34:02 PM UTC: 941 Fri 05 Aug 2022 05:34:09 PM UTC: 941
Versions Report Salt Version: Salt: 3004.1
Dependency Versions: cffi: Not Installed cherrypy: unknown dateutil: 2.7.3 docker-py: Not Installed gitdb: 2.0.6 gitpython: 3.0.7 Jinja2: 2.10.1 libgit2: 0.28.3 M2Crypto: Not Installed Mako: Not Installed msgpack: 0.6.2 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: 3.6.1 pygit2: 1.0.3 Python: 3.8.10 (default, Mar 15 2022, 12:22:08) python-gnupg: 0.4.5 PyYAML: 5.3.1 PyZMQ: 18.1.1 smmap: 2.0.5 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.2
System Versions: dist: ubuntu 20.04 focal locale: iso8859-1 machine: x86_64 release: 5.4.0-109-generic system: Linux version: Ubuntu 20.04 focal
Additional context
I have added some code to mitigate the issue: /usr/lib/python3/dist-packages/salt/utils/vault.py
345,356d344 < elif not response.ok and response.json().get("errors", None) == ["error performing token check: no lease entry found for token that ought to have one, possible eventual consistency issue"]: < log.error("sleeping 1 second before retry") < time.sleep(1) < response = make_request( < method, < resource, < token=None, < vault_url=vault_url, < get_token_url=get_token_url, < retry=True, < **args < )