hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.06k stars 4.12k forks source link

High CPU Usage after upgrade to 1.16.3 with enabled audit device #27514

Open MrFoxTrot opened 3 weeks ago

MrFoxTrot commented 3 weeks ago

Describe the bug Performance degradation after upgrading to 1.16.3

To Reproduce Steps to reproduce the behavior:

  1. Enable socket audit device on newer vault version (1.15.5+) (vault audit enable file)

Expected behavior Same perfomance as was on previous version

Environment:

Vault server configuration file(s):

disable_mlock = true

log_format = "standard"
log_level = "debug"
log_requests_level = "off"

pid_file = "/opt/vault/pid"

default_lease_ttl = "175200h"
max_lease_ttl = "175200h"

cluster_name="pre-test"

storage "raft" {
    path    = "/opt/vault/data"
    node_id = "node02"
}

ui = true
listener "tcp" {
    address = "0.0.0.0:8200"
    cluster_address = "0.0.0.0:8201"
    http_idle_timeout = "1h"
    http_read_header_timeout = "10s"
    http_read_timeout = "30s"
    max_request_size = 33554432
    max_request_duration = "90s"
    telemetry {
        unauthenticated_metrics_access = "true"
    }
}

api_addr = "http://node02.example.com:8200"
cluster_addr = "http://node02.example.com:8201"

telemetry {
    dogstatsd_addr = "localhost:8125"
    usage_gauge_period = "60s"
    disable_hostname = true
}

Additional context After updating to version 1.16.3 with the audit device enabled, there has been a significant increase in CPU usage. I updated from version 1.15.4, and in the current configuration with an average of around 50 requests per secons (RPS), the system load was approximately 5-10% per core. Following the update, CPU usage across all cores spiked to around 80%. There are two audit devices enabled (file and UDP socket). Afterwards, I decided to check other versions released within that timeframe and found that the issue began to occur from version 1.15.5 and onwards. Judging by the code, significant changes related to AuditBroker and the introduction of event functionality were made, but since I am using the community version, I cannot utilize this functionality. Perhaps it would be worth adding settings related to disabling this feature.

peteski22 commented 3 weeks ago

Hi @MrFoxTrot thanks for the report.

Would it be possible for you to review and update the reproduction steps, so we are able to easily follow along internally? The more detail and copy/paste commands, the better. 😄

I have tried testing the 1.16.x release branch (~1.16.4) with a file audit device and couldn't reproduce the issue. Are you finding it's only limited to socket device types?

Does this also happen with 1.17.x in your test environment? 1.17.0 from https://releases.hashicorp.com/vault/.

Unfortunately we are unlikely to offer a way to disable parts of the code as the changes were part of work to replace the underlying implementation that is used for audit (starting in 1.15.0). Vault events aren't connected to the changes to audit and so shouldn't have any impact.