Closed Iristyle closed 3 years ago
CI doesn't vet this script - only container builds will, and they need to opt in to rebuilding with a new SHA of ssl.sh
. Since this is on a branch, container builds can be vetted prior to merging this PR -- so I'm going to start putting up all the tentative related PRs for consumers (noting that puppetserver has its own process since it hosts the CA):
There is an equivalent PR for pupperware-commercial that impacts:
Enough vetting PRs have passed - merging and updating PRs.
Addresses errors like:
and
Refactored / amended script for additional failure scenarios and manually recreated them in k8s to validate behavior:
[ ] keypair generated on disk, cert exists but doesn't match local keypair -- this situation wasn't specifically tested, but given other surrounding branches were tested, this should be OK
Fix a number of minor problems with the ssl.sh script:
DNS_ALT_NAMES
to empty string when environment variable not specified$CERTFILE
variable where appropriateAdditionally make a number of improvements to the robustness of the script:
The check against the masters simple status endpoint is insufficient for determining server readiness. Also verify that there is a 200 response from the masters CA as well, as determined by trying to receive the CA file
This prevents an early race where containers error on inital cert generation like:
Error: cannot reach CA host 'pe-puppet'
Presumably this same race is also responsible for another message:
Error: cannot reach CRL host 'pe-puppet'
Though it is unclear as to why, given the CA check has already passed
Under rare circumstances any HTTP response may not be blank, but may be malformed with a unparseable status code.
In such instances, fail fast -- otherwise, the code may reach the end and perform a
return ""
When a CSR is ready, keep attempting to submit until a 200 response is received from server.
This prevents a scenario where a CA is temporarily unavailable, but a local key pair has been created. In the past, this could cause the script to abort on next run b/c it's expecting the cert-signing to be atomic and cannot recover when a keypair is on disk, but a signed cert hasn't yet been returned
Better determine if a signed cert for the given hostname matches a keypair already on disk or represents another failure.
In a scenario where the script failed, but still managed to submit a CSR that was signed by the server, but never downloaded... recover more gracefully by simply storing the cert locally.
Differentiate between a failure mode where:
no keypair yet exists, but a signed cert for this host exists
a keypair exists, but the signed cert doesn't match
When a cert exists and matches the private key, it's safe to assume that the public key also matches and that a CSR was generated
Another possible failure mode for this script is for a local keypair to have been created (potentially with a CSR), but for it to have never been sent to the master for signing
Address this situation by keeping the existing private key and recreating the public key and CSR (both derived from the private key) and continuing / redoing the submission process.
See also https://github.com/puppetlabs/holodeck-manifests/pull/424