hashicorp / vault-plugin-secrets-openldap

OpenLDAP secret engine for Vault
Mozilla Public License 2.0
18 stars 6 forks source link

Invalidates WAL entry for static role if password policy has changed #56

Closed austingebauer closed 1 year ago

austingebauer commented 1 year ago

Overview

WAL entries created by this secrets engine are the result of partial failure when trying to rotate a password for a static role. If the rotation partially fails, the system will continuously try to roll forward by reusing the password in the WAL entry. This behavior creates a problem when password constraints on the LDAP server are changed.

To give users an option to fix this, this change invalidates WAL entries for static roles when the configuration password_policy has changed. This causes the system to generate a new password instead of reusing the WAL's password that no longer meets the constraints of the LDAP server.

Note that this only works when the named password policy has changed (say "my-policy-1" to "my-policy-2"). It doesn't solve the problem where the contents of the named password policy have changed (say "my-policy-1" has a character set or length change). We don't have a good way to detect that a password policy has a new revision, so solving this for content changes is a larger effort that will involve code changes on the Vault side.

I verified that the test I wrote fails prior to the change in this PR:

$ go test -run=TestPasswordPolicyModificationInvalidatesWAL
2023-03-08T08:59:40.620-0800 [INFO]  initializing database rotation queue
2023-03-08T08:59:40.620-0800 [INFO]  populating role rotation queue
2023-03-08T08:59:40.620-0800 [DEBUG] no WAL entries found
2023-03-08T08:59:40.620-0800 [INFO]  starting periodic ticker
2023-03-08T08:59:40.620-0800 [DEBUG] wrote WAL: role=hashicorp WAL ID=1cdaf653-a445-5a89-c734-bcbd92cb26ff
2023-03-08T08:59:40.620-0800 [DEBUG] deleted WAL: WAL ID=1cdaf653-a445-5a89-c734-bcbd92cb26ff
2023-03-08T08:59:40.621-0800 [DEBUG] wrote WAL: role=hashicorp WAL ID=484e20cd-72ac-d9ae-ff4c-f37f8f34de55
2023-03-08T08:59:40.621-0800 [WARN]  unable to rotate credentials in rotate-role: error="forced error"
2023-03-08T08:59:40.621-0800 [DEBUG] deleted WAL: WAL ID=484e20cd-72ac-d9ae-ff4c-f37f8f34de55
--- FAIL: TestPasswordPolicyModificationInvalidatesWAL (0.00s)
    rotation_test.go:167: expected TestPolicy2Password, got TestPolicy1Password
2023-03-08T08:59:40.621-0800 [INFO]  stopping periodic ticker
FAIL
exit status 1
FAIL    github.com/hashicorp/vault-plugin-secrets-openldap      0.220s

Contributor Checklist

austingebauer commented 1 year ago

One way to have an automated solution for the password policy content change would be to discard the WAL entry after we've tried to apply it for some period of time (break circuit?). In that case, we'd regenerate the password with the current password policy contents. This would require no code change on the Vault side.

robmonte commented 1 year ago

Can you explain the failure versus the WAL entry a little more?

Is a change to the password constraints the thing causing the rotation to fail initially? Or is this a case where there's a coincidental situation: the password constraints just happen to change right around the time an unrelated rotation issue occurs which causes a WAL entry to be written?

Or are both wrong?

fairclothjm commented 1 year ago

@austingebauer

break circuit

That sounds like it could work. In the case where the regenerated password also is not applied due to a failure would we bail or will the system continuously try to roll forward?

austingebauer commented 1 year ago

@robmonte - Yep, a change to the password constraint is the thing causing the rotation to fail initially. It could also be a coincidental situation, but that would be much more rare and hard to reproduce.

austingebauer commented 1 year ago

@fairclothjm - I was thinking the system would continuously try to roll forward until the new WAL entry (with the newly generated password) itself hits the threshold. So all WAL entries would be subject to this threshold.

austingebauer commented 1 year ago

This PR alone gives customers an immediate workaround, so I'm going to merge this without the changes discussed about setting a threshold for WAL retries. I will create an additional task to address the password policy content changes via that approach.

robmonte commented 1 year ago

@austingebauer Thanks for answering. I know you already merged but I agree with this solution for an immediate workaround.

Regarding a longer-term automated fix - is there a reason we must store the new password at the time of the WAL entry being written? When a static role rotation WAL entry gets processed could it simply always generate a new password at that moment using the current password policy? Maybe that would circumvent this issue altogether, if possible.

austingebauer commented 1 year ago

@robmonte - I considered the idea of simply generating a new password each rotation attempt as well. Afaict, the same password is reused so that Vault storage will eventually contain the same password that was applied to the external system before things partially failed. The reuse of the same password in a roll-forward fashion is very engrained into the code and WAL handling at this point, so I'm hesitant to change it.