smallstep / certificates

🛡️ A private certificate authority (X.509 & SSH) & ACME server for secure automated certificate management, so you can use TLS everywhere & SSO for SSH.
https://smallstep.com/certificates
Apache License 2.0
6.67k stars 433 forks source link

Do not fail if a provisioner cannot be initialized #1765

Closed maraino closed 2 months ago

maraino commented 6 months ago

This commit will mark a provisioner as disabled if it fails to initialize. The provisioner will be visible, but authorizing a token with a disabled provisioner will always fail.

Fixes: #589, #1757

maraino commented 6 months ago

@dopey @hslatman, I still need to make sure this works as expected in all cases. Should I write the reason in the provisioner list, as I'm doing right now?

$ step ca certificate --token ... mariano@smallstep.com mariano.crt mariano.key
✔ CA: https://ca.smallstep.com:9000
provisioner "Google" is disabled due to an initialization error
Re-run with STEPDEBUG=1 for more info.
$ curl -s https://ca.smallstep.com:9000/provisioners | jq '.provisioners[] | select(.name=="Google")'
{
  "type": "OIDC",
  "name": "Google",
  "...": "...",
  "disabled": true,
  "disabledReason": "failed to connect to https://localhost:9000/.well-known/openid-configuration: Get \"https://localhost:9000/.well-known/openid-configuration\": dial tcp [::1]:9000: connect: connection refused"
}
PeterGrace commented 6 months ago

@maraino thanks for taking up this issue. Would it be possible for you to push a docker image or link me to a binary for x86_64 that I can use temporarily to fix my CA while this PR is under consideration? I haven't been able to sign certs for days and its starting to hurt. Thanks!

EDIT: I realized if I took the lint test out of the all: makefile line, I could build your branch as-is and did that. I was able to fix my provisioner with that image, and can now swap back to latest official. Thanks for working on the PR!

hslatman commented 6 months ago

@maraino the approach makes sense to me.

Only thing I can think of now is that it maybe needs to be a more specific state, such as Uninitialized instead of Disabled, or more generic than it is now, like an actual State. That way there could be other reasons than initialization errors for provisioners to not be available/usable in the future, such as when a user decides to disable a provisioner after migrating to a different provisioner with a period of them being both being available, for example. Keeping the old provisioner around but with some specific state indicating unavailability could then result in a different error message when there's an attempt to use it, instead of having to handle the case that the provisioner is not found.

Admittedly, that's a different thing than this issue, but the behavior is related to what's in this PR. It would probably make things slightly more complex because the state of the provisioner would have to be taken into account in more places, but I don't think that's a bad thing per se.

As long as the current changes are mostly internal, and our JSON outputs not being documented in formal API docs, I think we can change/break the property names and/or uses in the future too, so I don't view the above as a blocker for the current state to go in as is, and to change it in the future.