strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.81k stars 1.28k forks source link

[Enhancement]: hot reload Kafka on changes in `brokerCertChainAndKey` instead of a rolling update #9994

Open vpedosyuk opened 6 months ago

vpedosyuk commented 6 months ago

Related problem

From the docs:

When the certificate or key in the brokerCertChainAndKey secret is updated, the operator will automatically detect it in the next reconciliation and trigger a rolling update of the Kafka brokers to reload the certificate.

In an environment where a Kafka broker restart is very undesirable, it becomes hard to keep external TLS certificates short-lived (e.g. 24 hours with a 3rd-party PKI) because each change of certificates will cause a Kafka restart and usually a downtime.

In general, it'd be great to have as few reasons for a broker restart as possible.

Suggested solution

Once a Kubernetes secret referenced in brokerCertChainAndKey got changed, Strimzi Operator will dynamically replace old certificates with the new ones without restarting the brokers.

Alternatives

A proper HA configuration might reduce the effects of such restarts but it's not always possible.

Additional context

It seems like Kafka itself supports hot-swapping of certificates.

scholzj commented 6 months ago

Isn't this already tracked in some other issue? In any case, it should be kept in mind that:

I do not want to make it sound like this is not worth the effort -> just pointing out that this is not as simple as it might sound and has some obstacles. (I actually wrote the KIP-978 in Kafka exactly for this purpose, it just takes a long time to bubble through)

vpedosyuk commented 6 months ago

@scholzj yes, I've seen your KIP, thanks. In our case SAN and DN remain unchanged, the only thing that changes is expiration time, which is a common case for certificates renewal I believe.

scholzj commented 6 months ago

The problem is that unless you can change it all the time, it is basically not feasible because of the complexity. So that is why that KIP is important as it should allow to use it all the time (for the Kafka parts at least).

vpedosyuk commented 6 months ago

Understood. Anyways, thank you for your efforts!

P.S. I couldn't find a similar issue reported here, hence, created one.

scholzj commented 6 months ago

Discussed on the community call on 18.4.2024: Should be kept and implemented. A proposal will be needed.

applejag commented 4 weeks ago
  • Improved support for reloading certificates without any major limitations such as DN changes was added only in Kafka 3.7.0. So it is not easy to implement this while supporting Kafka 3.6.x.

Now with the release of Strimzi v0.43.0, we don't need to consider Kafka v3.6.x support anymore. At least one thing less to worry about.