Vault blocks (in certain conditions) even when one of two audit devices is available and working

radcool commented 5 years ago

Describe the bug I have a Vault 0.11.x installed with two audit devices configured: one file device and one socket (TCP) device. I've noticed that if the TCP listener that is the target of the socket device is not listening and immediately returns an RST or an ICMP unreachable then everything is fine (i.e. since Vault can still write to the file device Vault operations proceed normally). But if the port is blocked by a firewall (for example) and silently ignores/drops the socket device request, then the socket device keeps on trying to contact the target eventually timing out, all the while "freezing" the Vault operation from the user's point of view, making it impossible to perform a Vault operation, even if the file device is up and logging properly.

The docs for Blocked Audit Devices (https://www.vaultproject.io/docs/audit/index.html#blocked-audit-devices) state that "If you have more than one audit device, then Vault will complete the request as long as one audit device persists the log." This does not seem to be the case here.

Is this a bug, or does this statement not apply when the target audit device does not return any response (positive or negative) in a timely fashion?

To Reproduce Steps to reproduce the behavior:

(in one window)

vault server -dev

(in another window)

# export VAULT_ADDR='http://127.0.0.1:8200'
# vault audit enable file file_path=vault_audit.log
# vault login
Token (will be hidden):
Error authenticating: error looking up token: Error making API request.

URL: GET http://127.0.0.1:8200/v1/auth/token/lookup-self
Code: 403. Errors:

* permission denied
# vault audit enable socket address=5.5.5.5:5555

(wait until it times out)

Error enabling audit device: Put http://127.0.0.1:8200/v1/sys/audit/socket: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Expected behavior The way I read the documentation I expect Vault to asynchronously continue trying to write to the socket while at the same time immediately returning the result to the user. If this is not the expected behavior, perhaps the documentation could be expanded to differentiate Vault's behavior in the face of different types of "blocking" behaviors.

Environment:

Vault Server Version (retrieve with vault status): v0.11.1
Vault CLI Version (retrieve with vault version): v0.11.1
Server Operating System/Architecture: CentOS Linux 64-bit

jefferai commented 5 years ago

The way I read the documentation I expect Vault to asynchronously continue trying to write to the socket while at the same time immediately returning the result to the user.

This is not the (current) behavior. Audit logging happens in the context of the request, and it happens serially. If your TCP requests are blocking, they will block that request as well.

radcool commented 5 years ago

The docs for Blocked Audit Devices (https://www.vaultproject.io/docs/audit/index.html#blocked-audit-devices) has a few statements:

"Vault will not respond to requests if audit devices are blocked because audit logs are critically important and ignoring blocked requests opens an avenue for attack. Be absolutely certain that your audit devices cannot block."

If this were the only statement in the section, then I would say that the section is consistent with the current behavior.

However, there are these statements as well:

"If there are any audit devices enabled, Vault requires that at least one be able to persist the log before completing a Vault request."

"If you have more than one audit device, then Vault will complete the request as long as one audit device persists the log."

"If you have only one audit device enabled, and it is blocking (network block, etc.), then Vault will be unresponsive. Vault will not complete any requests until the audit device can write."

Don't these all suggest that as long as you have at least one non-blocking audit device Vault will not become unresponsive?

I agree that this is not the current, observed behavior. Would you then say that these three passages are incorrect, or at the very least, misleading? So perhaps not a bug in the behavior, but a bug in the documentation then?

biazmoreira commented 1 week ago

The doc states: "If any of the audit devices fail in a blocking fashion however, Vault requests will hang until the blocking is resolved."

I will be closing the issue since it seems the behavior is by design.

hashicorp / vault

Vault blocks (in certain conditions) even when one of two audit devices is available and working #5361