Closed kristian-lesko closed 2 years ago
See https://www.vaultproject.io/docs/commands/index.html#vault_max_retries
We added this because people complained that it didn't retry, now we are getting complaints that it does. We may disable it again in the future by default, but that value can be used to control behavior.
Hello @jefferai,
thanks for the quick reply. However, I am afraid that this issue is not caused by the VAULT_MAX_RETRIES
value, as with any value (0, 10 or anything else) these 5-second freezes still continue to occur like before.
What happens if you set it to -1
? We've actually changed the logic for the upcoming 0.10.2 (different library) but I think the previous library may not have worked the way we thought.
When I set VAULT_MAX_RETRIES
to -1, I receive the following error upon read:
failed to read environment: strconv.ParseUint: parsing "-1": invalid syntax
Fwiw I have also experienced this. We are testing vault and run a bootstrap she'll script that sets up the backends and a few roles and secrets. Probably 10 calls to vault. Service is otherwise almost I utilized but I still see a noticeable hang when running script sometimes from my Mac.
In 0.10.2, when it's out, try using VAULT_MAX_RETRIES=0. It's a different retry library.
@jefferai thanks for your help; unfortunately, the same issue keeps happening in version 0.10.2
; out of 1000 calls, about 990 complete within 0.1 seconds, but the remaining ~10 calls still take exactly 5 seconds more (i.e. about 5.0 - 5.1 seconds).
Any resolution on this? We're running 1.0.0 and appear to be hitting this issue using the Golang client (not the CLI tool).
If you mean our Golang API libraries, it has the same retry behavior. You may want to try setting it off and simply handling whatever the error is. (You may also want to look in your server logs for the error.)
facing the sam problems with vault 1.2.2 using vault-helm in AliCloud, only happens with vault CLI, get quick response with REST API calls
[~]$ time vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Version 1.2.2
Cluster Name vault-cluster-04418e87
Cluster ID a40880a0-f5c0-bf85-387c-89131efd27f2
HA Enabled true
HA Cluster https://172.20.0.189:8201
HA Mode standby
Active Node Address https://172.20.0.189:8200
real 0m5.399s
user 0m0.068s
sys 0m0.008s
@sgmiller Do you still see this issue? :-)
I'm closing this request as this issue does creep up occasionally and in almost all cases it's driven by environmental / VM factors not something specific to Vault binary.
For many of the exact same versions (including 1.9.x & 1.10.x) that were observed to be slow (only via CLI) there were OS / host configuration and when tested with an alternative hypervisor everything worked fine.
Hello everyone,
I've encountered the following issue with using the Vault CLI tool:
Describe the bug When running any
vault
CLI operation (likeread
,write
,secrets enable
etc.) on a 3-node Vault cluster with Consul backend, sometimes the operation freezes for (more or less exactly) 5 seconds before completing.To Reproduce Steps to reproduce the behavior:
vault write
command many times repeatedlyExpected behavior Operations running without the delay.
Environment:
vault status
): 0.10.1vault version
): Vault v0.10.1 ('756fdc4587350daf1c65b93647b2cc31a6f119cd')Vault server configuration file(s):
Additional context This happens randomly but often enough that the typical usage of Vault gets annoying for the user. When tested in a loop (without any delay), the issue happens about 5-6 times per 100 operations; however, for separate operations with a few seconds of delay, this can happen much more often (sometimes as often as every third operation).
We have a DNS alias pointing to the A-records of each of the 3 Vault clusters; the problem seems to happen regardless of whether we use this DNS alias or a direct URL of any single node.
The issue does not happen when using the API directly (e.g., by
curl
).The interesting thing is that when the freeze happens, it takes (more or less exactly) 5 seconds; i.e., when running 1000 operations in a loop, a single operation either takes 0.1xx seconds, or 5.1xx seconds, with nothing in between.
The operation always finishes correctly after the 5-second delay.