stackabletech / secret-operator

Other
9 stars 3 forks source link

Improve documentation of managing TLS certificate lifetime #353

Open Jimvin opened 5 months ago

Jimvin commented 5 months ago

The docs state that the certificates provided by autoTLS can be configured to have a lifetime longer than the default (24 hours), but they are not clear as to how this is implemented. The docs could be improved by describing how this is done, ideally with an example to illustrate what the configuration should look like.

sbernauer commented 1 month ago

@Jimvin what docs exactly are you referring to? For e.g. Trino I actually took the time to document it here: image

The documentation for the feature is here, users "just" need to use podOverrides to set these annotations. I would like to document the actual podOverrides as less as possible, as volume names are not guaranteed to be stable and the snippet in Trino can break any time. I would much rather implement it properly via a dedicated CRD field

Jimvin commented 1 month ago

We don't document the name of the volume that needs to be overridden for each service, and the names differ between services. The documentation link you included is the graceful shutdown documentation for Trino, which doesn't feel like an intuitive place for this to live. I agree we don't want to document lots of configOverrides as workarounds, at the same time there are folks who want to be able to extend the certificate lifetime. Is a mechanism to easily do this better than improving the docs?

sbernauer commented 1 month ago

We don't document the name of the volume that needs to be overridden for each service, and the names differ between services

Yep, that's a problem, combined with the volume name not being part of our public API (IMHO). I personally would say you very often need to look at the Pod you want to podOverride before anyways. How is the container called? How the volume? Where is this thing mounted I want to change? So maybe this problem is inherent to podOverrides.

Is a mechanism to easily do this better than improving the docs?

I would say definitively! I want a field such as clusterConfig.tls.certificateLifetime: 14d (actual CRD subject to change), which would remove the need to a.) Know about the volume name b.) copy/paste the same documentation 15 times and maintain it in case the volume names change

Jimvin commented 1 month ago

I'm happy for this feature request to pivot from documentation to implementation of the fix proposed by @sbernauer

sbernauer commented 1 month ago

WDYT @lfrancke? Should we park this ticket and wait for the proper implementation or start with the implementation right ahead? Or something else?

lfrancke commented 1 month ago

I like the idea of the CRD field. Can you estimate the effort?

sbernauer commented 1 month ago

I would say size M for all operators. First step is to come up with a CRD change. If it's a clusterConfig entry (e.g. spec.clusterConfig.tls.certificateLifetime) it's trivial and should be boring copy/past work, if the thing is merged via role and roleGroup it might be more complicated. We can make a better estimation when we know how the CRD should look like.

fhennig commented 1 month ago

@lfrancke this ticket is currently in refinement. if we instead now propose to change the course of action to instead implement something new, can we move this ticket somewhere else? Refinement seems done, and the outcome is to instead not do what is described here, but rather do something else (when?)

lfrancke commented 1 month ago

We were just waiting for a decision on how to proceed here.

I'd suggest a new issue for this. I can create one and will handle the movement of issues :)

lfrancke commented 1 month ago

There is now a new overarching issue: https://github.com/stackabletech/issues/issues/586