database static role doesn't rotate password after ttl

jacobmammoliti commented 5 years ago

Describe the bug Database Static Role does not rotate credential after TTL expires.

To Reproduce Steps to reproduce the behavior:

Create static role

$ vault write database/static-roles/test \
db_name=my-postgresql-database \
rotation_statements=@rotation.sql \
username="jacobm" \
rotation_period=30
Success! Data written to: database/static-roles/test

Get password


$ vault read database/static-creds/test
Key                    Value
---                    -----
last_vault_rotation    2019-09-23T15:28:04.212362868Z
password               A1a-jPKS3or3h1EN8Sa9
rotation_period        30s
ttl                    11s
username               jacobm

$ vault read database/static-creds/test Key Value

last_vault_rotation    2019-09-23T15:28:04.212362868Z
password               A1a-jPKS3or3h1EN8Sa9
rotation_period        30s
ttl                    0s
username               jacobm

$ vault read database/static-creds/test Key Value

last_vault_rotation    2019-09-23T15:28:04.212362868Z
password               A1a-jPKS3or3h1EN8Sa9
rotation_period        30s
ttl                    0s
username               jacobm


**Expected behavior**
Expect the password to rotate once the ttl reaches 0.

**Environment:**
* Database Version:

PSQL 9.6


* Vault Server Version:

$ vault status Key Value

Seal Type shamir Initialized true Sealed false Total Shares 5 Threshold 3 Version 1.2.0+prem Cluster Name vault-cluster-6b813709 Cluster ID b6bee58a-3e4f-f9b8-4e39-1e999cda1746 HA Enabled true HA Cluster https://10.128.0.3:8201 HA Mode standby Active Node Address http://10.128.0.3:8200 Performance Standby Node true Performance Standby Last Remote WAL 165


* Vault CLI Version:

Vault v1.2.0+prem


* Server Operating System/Architecture:

CentOS 7

Vault server configuration file(s):

listener "tcp" { address = "0.0.0.0:8200" tls_disable = "true" }

storage "raft" { path = "/opt/vault/raft" node_id = "raft_node_2" }

cluster_addr = "http://10.128.0.4:8201" api_addr = "http://10.128.0.4:8200" ui = "true"


**Additional context**
Seeing these logs when I do a force (`vault write -f database/rotate-role/test`):

Sep 23 15:51:51 vault-2 vault: 2019-09-23T15:51:51.300Z [WARN] secrets.database.database_0e524392: unable to rotate credentials in rotate-role: error="error writing WAL entry: cannot write to readonly storage" Sep 23 15:51:51 vault-2 vault: 2019-09-23T15:51:51.300Z [INFO] http: panic serving 127.0.0.1:58880: runtime error: invalid memory address or nil pointer dereference

ghost commented 5 years ago

@arctiqjacob @catsby I was able to reproduce your issue on CentOS 7 with raft. And what I discovered looking in the code is that this issue can occur when you have replicated storage that is part of a cluster. What happens codewise is that when you try to rotate your credentials it tries to first write a WAL to the storage log, this is where it raises the exception you get, thus vault never actually gets to the point where it rewrites the credentials. Using raft normally only the elected leader node can accept write operations and propagate them to the replication ones. On my reproduction of your issue, I was able to hit that error only when I was using the secondary(replica) node which can't accept write operations. Can you check if the node-id that you gave for the storage is actually the leader node, so I can eliminate that possibility? If it is can you provide more logs from the time when the role was initially created? Or try to create a new role and after that rewrite it when it is still made sure that the current mounted storage is the leader one? Also take into account that the raft standard will try to elect a new leader node if for certain amount of time the current leader node didn't submit any logs.

jacobmammoliti commented 5 years ago

@n17h31sm I've checked that the node-id that I'm using is in fact the leader in raft (I'm targeting the 10.128.0.42 Vault node).

{
    "auth": null,
    "data": {
        "config": {
            "index": 49,
            "servers": [
                {
                    "address": "10.128.0.42:8201",
                    "leader": true,
                    "node_id": "raft0",
                    "protocol_version": "3",
                    "voter": true
                },
                {
                    "address": "10.128.0.43:8201",
                    "leader": false,
                    "node_id": "raft1",
                    "protocol_version": "3",
                    "voter": true
                }
            ]
        }
    },
    "lease_duration": 0,
    "lease_id": "",
    "renewable": false,
    "request_id": "f1a7d6a0-7704-437a-9850-97bdc5c4435f",
    "warnings": null,
    "wrap_info": null
}

With doing that, I found now that I can force a regeneration with:

vault write -f database/rotate-role/testuser

But it still doesn't seem to rotate when the TTL hits 0. I don't see any logs now pointing to a WAL issue. So I'm trying to figure out now why I don't see it. Going to try and dig into it more.

Also, if I hit the standby Vault node, wouldn't it redirect to the active Vault node backed by the leader in raft?

ordith commented 4 years ago

We are experiencing this same issue on Vault 1.2.3 running on Ubuntu 16.04 with Consul backend for HA.

michelvocks commented 4 years ago

Hi @arctiqjacob!

I've tried to reproduce your issue but wasn't able to do so. I created a three-node raft cluster and used the third (non-leader) node to enable and setup the database plugin. Would you mind to try it again with the latest Vault version? It would be also helpful to get more information on your setup e.g. how many Vault nodes do you have? Is the connected node a performance secondary or is it a performance standby?

Additional context Seeing these logs when I do a force (vault write -f database/rotate-role/test):

This is a known bug and has been fixed with https://github.com/hashicorp/vault/pull/8105

Cheers, Michel

azizj1 commented 4 years ago

What is the max rotation_period?

catsby commented 4 years ago

Hello - we haven't heard back in some time, so we're going to close this issue for now. If you have more information please let us know by opening a new issue, and optionally referencing this one. Thanks!

pySilver commented 4 years ago

@catsby can you please elaborate on a question regarding max rotation_period? Thanks!

catsby commented 4 years ago

There currently is no upper bound limit to rotation_period. The underlying data structure for rotation_period is time.Duration, so in theory the maximum rotation_period is approximately 290 years, give or take a few minutes 😄

I should add that the use of time.Duration is an internal implementation detail and as such could be subject to change without notice. The lack of upper bounds is subject to change as well but that would be noted in the CHANGELOG.md at the very least.

hashicorp / vault

database static role doesn't rotate password after ttl #7502