strimzi / strimzi-kafka-oauth

OAuth2 support for Apache Kafka® to work with many OAuth2 authorization servers
Apache License 2.0
147 stars 90 forks source link

Fix KeycloakRBACAuthorizer to work with `StandardAuthorizer` in KRAFT mode #188

Closed mstruk closed 1 year ago

mstruk commented 1 year ago

This PR addresses KeycloakRBACAuthorizer not working in KRaft mode.

In order to upgrade authorizers to KRaft mode, Kafka has introduced a new authorizer interface called ClusterMetadataAuthorizer which any authorizer that wants to work in KRaft mode has to implement. To accommodate that, and still preserve backwards compatibility we introduce a new authorizer class called KeycloakAuthorizer. KeycloakRBACAuthorizer is still there and is used by KeycloakAuthorizer for all the Keycloak specific functionality. It can still be configured directly to be used in Zookeeper mode so existing authorizer configurations won't break and don't have to be changed. However, users should start migrating their configurations to KeycloakAuthorizer which supports both KRaft and Zookeeper mode and allows choosing one or the other. KeycloakAuthorizer implements the new ClusterMetadataAuthorizer interface used by KRaft mode, it auto-detects whether it's running in KRaft mode or Zookeeper mode, and sets up appropriate delegation classes for Kafka standard ACL delegation. It also makes sure there is a singleton instance of KeycloakRBACAuthorizer used behind the scenes regardless of the number of KeycloakAuthorizers instantiated. In KRaft mode multiple instances of the configured authorizer can be instantiated, and having one instance with one grants cache, and one set of background services that take care of refreshing those grants avoids duplicating these resources.

In short, new deployments should use KeycloakAuthorizer. They can then freely choose whether to use Zookeeper mode or KRaft mode. Old deployments can keep using KeycloakRBACAuthorizer in Zookeeper mode also in older versions of Kafka that don't yet contain the new ClusterMetadataAuthorizer interface.

One important change during this refactor is also that from now on the grants are no longer cached for each access token, but are cached for individual user id. That significantly reduces the number of cached grants, and the number of grants refresh jobs and associated requests to Keycloak token (grants) endpoint.

mstruk commented 1 year ago

I've been having a difficulty getting the testsuite to work in KRAFT mode. The broker starts fine and I can see in the log that KRAFT mode is detected and the StandardAuthorizer is used as the delegate. The authorization requests are working as they are supposed to based on the log.

But for some reason only the controller listener port is responsive, the reason that there are some requests at all as the broker connects to itself and all those connections make some requests that go through the authorizer. The broker listeners ports are accepting connections but requests just hang on them. I'm trying to find the reason.

In any case this PR even though without the test can only improve the behaviour in KRAFT mode, since without it the KeycloakRBACAuthorizer simply crashes with exception in KRAFT mode.

scholzj commented 1 year ago

@mstruk How do you run the KRaft cluster? I have a simple Docker Compose file for KRaft based on the Strimzi image here: https://github.com/scholzj/strimzi-compose-up/blob/main/kraft/docker-compose.yaml if that helps

mstruk commented 1 year ago

I do all these steps in my own script: https://github.com/mstruk/strimzi-kafka-oauth/blob/kraft-tests/testsuite/docker/kafka/scripts/start-kraft.sh

But then I have a lot more complex configuration in terms of listeners: https://github.com/mstruk/strimzi-kafka-oauth/blob/kraft-tests/testsuite/keycloak-authz-kraft-tests/docker-compose.yml

There must be some little detail somewhere in there, but it's frustrating how it is even possible to have no exception at all. I'll have to start from complete scratch and see it work, and then add things one at a time until it breaks.

mstruk commented 1 year ago

That will take a day though, so I'm for merging this without a testsuite for now. As I said, it can't make anything worse than it already is in KRAFT mode.

mstruk commented 1 year ago

Since the tests have passed it means the 'zookeeper' mode is still working as it's supposed to.

scholzj commented 1 year ago

That will take a day though, so I'm for merging this without a testsuite for now. As I said, it can't make anything worse than it already is in KRAFT mode.

I don't think this is the right approach. We should do this properly. I think that not supporting KRaft at all is fine at this point, we do not need to hurry this in.

mstruk commented 1 year ago

Very well. This won't make it into 0.12.0 then.

I'll need to better understand the two issues you are mentioning - upgrade, and early-start listeners.

mstruk commented 1 year ago

I opend this PR for a review. I'm not sure the following two points should be a concern of this PR:

  • What it does during an upgrade
  • Figure out how to deal with early-start listeners

Authorizer is instantiated and configured by Kafka broker and lifecycle methods like start(), close(), configure() are called on it. Authorizer is instantiated independently of the listeners and after the listeners are configured. I guess if the listeners are updated at runtime the broker decides to what extent that impacts the authorizer and some lifecycle methods are invoked in a certain order or nothing is done at all. Currently my assumption is that things will "just work". If it turns out that is not the case we can address it in another PR.

scholzj commented 1 year ago

Authorizer is instantiated and configured by Kafka broker and lifecycle methods like start(), close(), configure() are called on it. Authorizer is instantiated independently of the listeners and after the listeners are configured. I guess if the listeners are updated at runtime the broker decides to what extent that impacts the authorizer and some lifecycle methods are invoked in a certain order or nothing is done at all. Currently my assumption is that things will "just work". If it turns out that is not the case we can address it in another PR.

I think you have to test it. Leaving it to another PR is fine, but I would expect any other tests to be done only after the release and not after the PR unless you do them.

scholzj commented 1 year ago

Or alternatively, it might not work in KRaft in that release, which is something we can live with as well if needed I guess as it is anyway not production ready 🤷

mstruk commented 1 year ago

I hope I sufficiently addressed the comments and proposals. I'm still looking at Travis CI failures if it's an intermittent failure or something specific to Java 8.

Other than that, is there something else that needs fixing or getting addressed better within this PR or separately?

mstruk commented 1 year ago

@tombentley Does this require another review or do you think it's now good to go?

tombentley commented 1 year ago

@mstruk I'll try to take a look by this time tomorrow.