Closed mstruk closed 1 year ago
I've been having a difficulty getting the testsuite to work in KRAFT mode. The broker starts fine and I can see in the log that KRAFT mode is detected and the StandardAuthorizer
is used as the delegate. The authorization requests are working as they are supposed to based on the log.
But for some reason only the controller listener port is responsive, the reason that there are some requests at all as the broker connects to itself and all those connections make some requests that go through the authorizer. The broker listeners ports are accepting connections but requests just hang on them. I'm trying to find the reason.
In any case this PR even though without the test can only improve the behaviour in KRAFT mode, since without it the KeycloakRBACAuthorizer simply crashes with exception in KRAFT mode.
@mstruk How do you run the KRaft cluster? I have a simple Docker Compose file for KRaft based on the Strimzi image here: https://github.com/scholzj/strimzi-compose-up/blob/main/kraft/docker-compose.yaml if that helps
I do all these steps in my own script: https://github.com/mstruk/strimzi-kafka-oauth/blob/kraft-tests/testsuite/docker/kafka/scripts/start-kraft.sh
But then I have a lot more complex configuration in terms of listeners: https://github.com/mstruk/strimzi-kafka-oauth/blob/kraft-tests/testsuite/keycloak-authz-kraft-tests/docker-compose.yml
There must be some little detail somewhere in there, but it's frustrating how it is even possible to have no exception at all. I'll have to start from complete scratch and see it work, and then add things one at a time until it breaks.
That will take a day though, so I'm for merging this without a testsuite for now. As I said, it can't make anything worse than it already is in KRAFT mode.
Since the tests have passed it means the 'zookeeper' mode is still working as it's supposed to.
That will take a day though, so I'm for merging this without a testsuite for now. As I said, it can't make anything worse than it already is in KRAFT mode.
I don't think this is the right approach. We should do this properly. I think that not supporting KRaft at all is fine at this point, we do not need to hurry this in.
Very well. This won't make it into 0.12.0 then.
I'll need to better understand the two issues you are mentioning - upgrade, and early-start listeners.
I opend this PR for a review. I'm not sure the following two points should be a concern of this PR:
- What it does during an upgrade
- Figure out how to deal with early-start listeners
Authorizer is instantiated and configured by Kafka broker and lifecycle methods like start(), close(), configure() are called on it. Authorizer is instantiated independently of the listeners and after the listeners are configured. I guess if the listeners are updated at runtime the broker decides to what extent that impacts the authorizer and some lifecycle methods are invoked in a certain order or nothing is done at all. Currently my assumption is that things will "just work". If it turns out that is not the case we can address it in another PR.
Authorizer is instantiated and configured by Kafka broker and lifecycle methods like start(), close(), configure() are called on it. Authorizer is instantiated independently of the listeners and after the listeners are configured. I guess if the listeners are updated at runtime the broker decides to what extent that impacts the authorizer and some lifecycle methods are invoked in a certain order or nothing is done at all. Currently my assumption is that things will "just work". If it turns out that is not the case we can address it in another PR.
I think you have to test it. Leaving it to another PR is fine, but I would expect any other tests to be done only after the release and not after the PR unless you do them.
Or alternatively, it might not work in KRaft in that release, which is something we can live with as well if needed I guess as it is anyway not production ready 🤷
I hope I sufficiently addressed the comments and proposals. I'm still looking at Travis CI failures if it's an intermittent failure or something specific to Java 8.
Other than that, is there something else that needs fixing or getting addressed better within this PR or separately?
@tombentley Does this require another review or do you think it's now good to go?
@mstruk I'll try to take a look by this time tomorrow.
This PR addresses
KeycloakRBACAuthorizer
not working inKRaft
mode.In order to upgrade authorizers to
KRaft
mode,Kafka
has introduced a new authorizer interface calledClusterMetadataAuthorizer
which any authorizer that wants to work inKRaft
mode has to implement. To accommodate that, and still preserve backwards compatibility we introduce a new authorizer class calledKeycloakAuthorizer
.KeycloakRBACAuthorizer
is still there and is used byKeycloakAuthorizer
for all theKeycloak
specific functionality. It can still be configured directly to be used inZookeeper
mode so existing authorizer configurations won't break and don't have to be changed. However, users should start migrating their configurations toKeycloakAuthorizer
which supports bothKRaft
andZookeeper
mode and allows choosing one or the other.KeycloakAuthorizer
implements the newClusterMetadataAuthorizer
interface used byKRaft
mode, it auto-detects whether it's running inKRaft
mode orZookeeper
mode, and sets up appropriate delegation classes for Kafka standard ACL delegation. It also makes sure there is a singleton instance ofKeycloakRBACAuthorizer
used behind the scenes regardless of the number ofKeycloakAuthorizer
s instantiated. InKRaft
mode multiple instances of the configured authorizer can be instantiated, and having one instance with one grants cache, and one set of background services that take care of refreshing those grants avoids duplicating these resources.In short, new deployments should use
KeycloakAuthorizer
. They can then freely choose whether to useZookeeper
mode orKRaft
mode. Old deployments can keep usingKeycloakRBACAuthorizer
inZookeeper
mode also in older versions ofKafka
that don't yet contain the newClusterMetadataAuthorizer
interface.One important change during this refactor is also that from now on the grants are no longer cached for each access token, but are cached for individual user id. That significantly reduces the number of cached grants, and the number of grants refresh jobs and associated requests to
Keycloak
token (grants) endpoint.