Open anthonyjlmorel opened 3 years ago
- How does Step-ca behave when it receives a certificate renewal request from a certificate that are no longer in its DB ? (as the certificate could have been either "unrevoked", or step-ca does not see it in its "emitted list" ?)
As long as the provisioner is still available the renewal will succeed.
- Do you have example in the wild of how to cope with the "unrevoked" issue, and this problem in general ?
I know that some users have a process to create CRLs or OCSP responders from the information in the DB or using their own revoke mechanism. I don't have details on how they are set up, but as long as they are available, they can be used as an alternative.
Hi Guys,
Sorry for the delay, you know how it is when you got to get your head down for developing things :smile_cat:
Let me summarize for us to understand, and to get all the conclusions, so that somebody in my case would get the information right.
Let us say I backup my DB at D - 1, and today it is D, my server crashed and I need to restore de MySql backup from D - 1 for step to reboot.
What concerned me was the two following points:
Case 1 This concerned me for the renewal process. But this is dealt with, as you mentioned, a renewal request is granted even if the "former" certificate is not in DB, and a new line is inserted in DB, which is fine to me.
Case 2 This one however is a bit problematic to me. That would mean that revoked certificate between D - 1 and D could be technically renewed. You would, however, argue that "as certificates are valid only for 24h, you would eventually be good with that". We could have, except for two technical points, the second maybe is the most problematic:
(2.1.) Today, I am actively waiting for the "active revocation" list in your roadmap as our certificate lifespan is not 24h. We need to keep track of revoked certificates and actively check them
(2.2.) You have also something interesting in your roadmap, being able to renew a certificate which lifespan has been passed. This is a huge must have for us, as our clients are not necessarily connected continuously, hence, can ask for a renewal after the lifespan.
Technically speaking, we keep track of the "revoked certificates" by hand, as this is a rare case, for now. So we could potentially "re revoke them" after the crash and that would solve the problem.
The only remaining issue is the renewal chain. If somebody steals my client's certificate/key pair (by stealing the device those are on for instance), I cannot revoke the certificate by id and the guy can have time to issue renewal requests. If I can "map the client with their initial certificate", I would love to be able to "extract" the certificate renewal chain between two dates to be able to revoke the potential "new certificates" the thief would have issued.
But this is a current work, if I believe this issue, right ?
Hey @anthonyjlmorel,
It's been a while. We haven't resolved this, yet, but we're starting to think about these issues and are working on a design that I think will help here.
For cross-reference, this issue is related to
The current design of renewal in step-ca
allows for renewal as long as a valid certificate is presented that was issued by a provisioner that has renewal enabled (we embed the provisioner name as a certificate extension, so we can verify this without a database of issued certificates).
The strawman design we have for renew-after-expiry changes this. The (tentative) new design will maintain a per-certificate renewal policy in the CA database. There will be (at least) three policies: 1) not renewable, 2) renew if valid (current behavior), and 3) renew after expiry. There will be an administrative API to change the policy for a particular certificate.
If you lost data, certificates would not be renewable. That's probably the right default. To recover from the failure, you'd need to re-set any lost renewal policies via the API. This may be tricky if the certificates are distributed across a large ecosystem. Perhaps we could also track failed certificate renewal requests and you could monitor those failures and reset renewal policy as they come in? If clients are configured to retry renewal this should work.
Regarding renewal chains: we've also been discussing a certificate lineage API (and corresponding CLI functionality). This would let you get a list of "child" certificates for a given certificate. You could follow this child relationship transitively and revoke all certificates that are downstream of a particular compromised key. Would that work for you?
@mmalone what you're proposing sounds great. IMHO db disaster recovery should be none of your concern, because users relying on the service productionally should have a replica anyway. If that isn't an option they could always go for a cloud service with the corresponding SLAs or follow your suggestion in paragraph two. Am I on the wrong track here?
Hey @mmalone
Thanks for the update ! That sounds terrific ! Indeed, for our use cases, we would use the 2/ renew if valid or 3/ renew after expiry, depending on which kind of end product has the certificate.
I totally agree on the "audit" point, where we should monitor somewhere certificate access. In our case, it makes more sense to audit the certificate use as we cannot have two valid certificates at the same time in use in our infrastructure (so, nothing to do with Step-ca). We would then instrument step-ca to revoke one of them.
Regarding the renewal chain: yes, a tool to ID the "chain" of renewal would be perfect. We were working on this kind of CLI to ID this chain from the step-ca logs, but if you can do it, we would take it, of course.
The only issue remaining, IMO, is in the case of "data loss" (if the server crashed or something like a disaster). This is up to us to create a replication strategy to avoid data loss. For this part, I was wondering if you guys were investigating ways to add more DBs support (like PostgreSQL) ? So that people can better align with DB strategies of replication and their business needs.
Anyway, thanks for the tremendous work here ! For my use case, this issue can be closed!
Hi,
I am posting this here as I think this could interest a lot of people, and I did not find anything regarding this matter in the repository (maybe I haven't use the right words ? In this case, do not hesitate to redirect me).
I am facing a production-environment question about Service-Level Agreement for my certificate authority.
Let's say:
And let us say that, today, my MySQL server crashes at 11am and I lose all data in the crash (day D). I am able to restart an instance, and take the backup of the DB from 1pm of day D - 1.
So, I lose all emitted certificates serial number (and meta data stored by step-ca) from day D-1 1pm and day D 11am.
From an "access" perspective, all emitted certificate within this time frame can be used to be authenticated to a service that possesses the root certificate.
But, we cannot check the "revocation status" of the certificate, as we lost all meta data (hence, if the certificate has been revoked within the lost time frame, it becomes "unrevoked").
My questions are:
Thank you