Open vmaletic opened 3 months ago
The same behaviour is observable in version 1.15.6
we got the problem too. Did you find a reason? We moved from 1.12 to 1.16
Unfortunately, no. We are sticking with version 1.15.4. We tested all subsequent versions (1.15.5 and later, including 1.16.x) and observed the same behavior.
Thank you for testing this on 1.16 as well. I'll bring it up to our engineers. :)
Weird stuff: we rotate the transit key, and it solved the issue. We don't understand what could be the difference, as the old & the new keys are both working. Just the old one is causing high CPU usage
yeah this is really strange behavior for sure, but that's pretty good news and something we will test and report back on
Yesterday, we performed transit key rotation on all our transit secret engines. Subsequently, we upgraded to the latest version of Vault (1.17.0) and initiated our standard load testing. Unfortunately, we encountered significant performance degradation, which we had previously reported. Specifically:
Interestingly, reverting to Vault 1.15.4 resolved the issue entirely. With this version, performance is optimal, reaching up to 50-60% CPU load at 1000 RPS.
We are keen to understand why this performance discrepancy exists since versions 1.15.5 and 1.17.0. Any insights would be greatly appreciated.
May you rotate your key again to see if it fixes the problem ? That's how we solved it
not other info? We're about to rotate our key to solve the problem, but that's a pretty odd solution, without clear reason on the root cause
Out of interest, did you rotate your transit keys while running on the latest version of Vault or did you complete the transit key rotation using a specific version of Vault and then upgrading to the latest version?
Please provide more information in terms of what worked for you (in order for us to test if we can replicate with same success as you've reported). Thank you in advance!
We upgraded first. Then we realised that there was an issue, and decided to rotate the keys (still on the newest version). Then the problem was solved
Describe the bug After upgrading from Vault version 1.15.4 to 1.15.5, there is high CPU usage on Vault servers when transit operations are called, even with a relatively small number of requests per second (RPS), causing CPU core usage to reach 100%.
To Reproduce Steps to reproduce the behavior:
Expected behavior After upgrading from Vault version 1.15.4 to 1.15.5, the CPU usage during transit operations should remain within acceptable limits. Specifically, the CPU core usage should not spike to 100% under small RPS.
Environment:
vault status
): Vault v1.15.5vault version
): Vault v1.15.5 (0d8b67ef63815f20421c11fe9152d435af3403e6), built 2024-01-26T14:53:40ZVault server configuration file(s):
Additional context Vault telemetry for version 1.15.5 with max. 300 RPS to transit backends during 5 minutes testing timeframe
CPU usage![image](https://github.com/hashicorp/vault/assets/1201797/40ddffb7-cdec-471e-af36-4454a3565342)
Transit usage![image](https://github.com/hashicorp/vault/assets/1201797/19bf0605-006c-4352-8419-65f4bdd07e15)
Vault telemetry for version 1.15.4 with max. 2000 RPS to transit backends during 45 minutes testing timeframe
CPU usage![image](https://github.com/hashicorp/vault/assets/1201797/b5846f0f-b372-430f-addb-f3b8980bb501)
Transit usage![image](https://github.com/hashicorp/vault/assets/1201797/0c2bed1f-6062-4863-9080-46d470c5a0a5)