strimzi / strimzi-kafka-oauth

OAuth2 support for Apache Kafka® to work with many OAuth2 authorization servers
Apache License 2.0
140 stars 89 forks source link

Pass access policies in JWT for generic OAuth2 integration #223

Open iblutrifork opened 7 months ago

iblutrifork commented 7 months ago

It would be nice to be able to pass authorization policies in the JWT instead of storing them in Keycloak in order to make the OAuth2 integration more generic and less reliant on specific identity/access management services. The use-case is when using a custom authentication/authorization service with OAuth2, e.g., Azure Active Directory B2C. The service can be a source of truth for policies and pass these policies as a part of the JWT. Upon receiving the JWT, the Kafka Authorizer looks at the policies and grants permission based on them. For example, the JWT can look like:

{
  ...
  "policies": [ "cluster_a:topic_b:write|describe", "*:topic_*:all" ]
}

I can work on implementing this.

scholzj commented 7 months ago

Isn't it considered bad practice to include things like this in JWT? Especially given they can grow really huge if you use them to manage Kafka ACLs.

cthtrifork commented 7 months ago

Isn't it considered bad practice to include things like this in JWT? Especially given they can grow really huge if you use them to manage Kafka ACLs.

We have a client which would like to manage all permissions in their IDP. The IDP is Azure B2C which does not mimic the functionality in Keycloak (we think) that supports simply containing a role in a JWT and lets the IDP handle the granular permissions.

We have made a custom plugin, but would like to make it generic and submit it to you guys.

Do you see any alternatives?

scholzj commented 7 months ago

I do not think that answers my question 😉. I understand that what Keycloak does is proprietary and does not work anywhere else. But whatever else we do, it has to make sense and be usable and maintainable.

cthtrifork commented 7 months ago

I do not think that answers my question 😉. I understand that what Keycloak does is proprietary and does not work anywhere else. But whatever else we do, it has to make sense and be usable and maintainable.

Well I wouldn't call it best practice or standard practice, but I think it makes sense to enable Kafka ACLs to a 3rd party OAuth2 IDP as an option. Once implemented, it should not be a big maintenance burden as it relies on mapping claims to Kafka ACLs which are quite mature+static.

But we need to know if there is support to merge this into strimzi.

scholzj commented 7 months ago

But we need to know if there is support to merge this into strimzi.

Well, as I suggested, I think this requires a bigger discussion:

The addition of this should have a proposal: https://github.com/strimzi/proposals. But it might be worth more discussion before you jump into writing one to not waste time.

I personally think that the Keycloak authorization was a misstep because of low usability (requires Keycloak, seems hard to understand even for Keycloak users). And it actually is a lot of effort to maintain it - especially for how many users seem to use it. We should avoid adding more options like that. So I think it is important to explain why this would not be just another option like that. Because if it is, we should instead focus on giving you the extensibility options to use custom mechanisms instead of adopting the feature.

cthtrifork commented 7 months ago

Great points and great feedback!

For context, we are doing a very granular control where each (micro)service has write and/or read access to 1 or 2 topics. This is being managed by the IDP and the setup lets us skip having a kafka-role defined for each service. The consequence is that we have a lot of client credentials in the IDP, which we would have regardless of using kafka ACLs or kafkaroles mappings.

It would need to be clear it works beyond Azure Active Directory B2C (you seem to put a lot of emphasis on a generic solution as well as on a single specific product which seems strange)

Yes it is generic, i highlighted Azure B2C as its very inflexible compared to Keycloak. That was the only reason for emphasis.

What is it you exactly need? Why can't you use a custom authorizer that is already supported in Strimzi?

Our solution seems very generic and we would like not to maintain a custom Strimzi Kafka version with our plugin. It would be perfectly acceptable for us to include e2e testing (we could even use keycloak) as a github action and more to reduce maintenance burden.

The limits of what it can do should be discussed - e.g. how many ACL rules can a single JWT token accommodate? 10? 100? 1000? What will be the performance?

There is no direct limits, however, the drawback is definitely an payload overhead as the JWT can, in theory, grow very large. So caution is required. ...

Any change or addition requires more maintenance, but we foresaw more people having the same need as us. I.e having the IDP being the single source of truth and not introducing a role-based abstraction layer.

I guess next step is to evaluate if anyone else can see the value of this contribution? Thanks for the proposals reference, I had not seen that one.

mstruk commented 7 months ago

I think the suitability of using JWT tokens for authorization rights needs to be discussed

As long as the communication between a Kafka client and Kafka cluster is TLS secured, and communication between Kafka client and the authorization server is TLS secured (which it should always be) the JWT token itself would only live in the Kafka client / broker memory and be sent between trusted systems on a secure connection. That is the standard production setup which should be in place in any case.

JWT tokens are signed, and their signatures are checked, and any tampering is immediately detected. If intercepted, or leaked, the JWT tokens can be read (we don't support encrypted tokens), so any ACL info can be leaked. That can give a potential attacker more information about what the user is allowed, but is not by itself exploitable.

One issue may be that there is no way to update the ACLs for the duration of JWT token validity (until it expires). If you need to revoke the token quickly, you have to issue tokens with short lifespan which results in Kafka clients having to re-login and re-authenticate to Kafka more often (heavier load mostly on your authorization server in terms of connections established, and in terms of CPU cycles).

Why is it worth investing in yet another solution with limited usability and tight coupling to OAuth instead of doing it for a more generic solution based for example on projects such as OpenFGA that can be used with any authentication?

When your authorization server speaks OAuth / OIDC and has support for managing per-client or per-user extra JWT token claims, that sounds like a natural fit. OpenFGA in that sense is actually a more complex solution to the problem. Since JWT based solution would have to integrate with our OAuth layer to some extent - to get to the access token used during session authentication, any internal change that we make in OAuth layer can break any solution that's maintained in-house, so that's one motivation to make the solution part of this project.

I guess for the solution to be generic it would require that the claim where the ACL specs are passed in JWT token is configurable. Also it would be good to provide enough documentation, guides for specific use with Azure Active Directory so that any one can actually use it. It could possibly be used with other cloud providers' directory server solutions.

iblutrifork commented 6 months ago

If we work on this, I will naturally add the necessary documentation/examples/guides/tests/etc. on how to use the new Authorizer

If you need to revoke the token quickly, you have to issue tokens with short lifespan ...

If the JWT expiration is set to the same as for Keycloak, it should be comparable performance-wise. On the other hand, one could decide they don't need such quick propagation and set the expiration to hours or days.

it would require that the claim where the ACL specs are passed in JWT token is configurable

Could you elaborate on this? What kind of configurability makes sense to you?

What are the limits of using JWT?

I'd say passing 10s of ACL rules would pose no problem. 100s would be possible if the ACL rules are compact enough, but the overhead would be too much. 1000s is a hard no because of severe performance degradation. In general, a good architecture dictates that a client's permissions should be as few as possible, so there should be at most a couple dozen rules. I think if a client has hundreds of rules, either the client is reading/writing to too many topics or the permissions are too loose, which both sound like bad design. Passing ACLs as a list of strings as in the example is possible in both Keycloak and Azure. I would have to check for other auth providers, but providing a list of permissions seems like a basic functionality.

What is it you exactly need? Why can't you use a custom authorizer that is already supported in Strimzi?

The purpose of opening this discussion is to see if we can find a more generic authorizer that works out of the box for more people and doesn't require developers to use a custom authorizer. In other words, analyzing what custom authorizers developers create, and making these into 1 generic authorizer that is provided as a part of strimzi's kafka oauth. From our experience, passing ACLs as a part of the JWT is a generic solution that works well with OAuth-integrated services.

mstruk commented 6 months ago

If the JWT expiration is set to the same as for Keycloak, it should be comparable performance-wise. On the other hand, one could decide they don't need such quick propagation and set the expiration to hours or days.

Keycloak authorizer periodically refreshes grants for existing sessions (the interval is configurable). That allows removal of permissions to be detected before the session expires, thus it is decoupled from access token expiry time, and session expiry.

it would require that the claim where the ACL specs are passed in JWT token is configurable

Could you elaborate on this? What kind of configurability makes sense to you?

Rather than specifying / mandating that ACLs are available in the token as a String array under acls claim, that should probably be made configurable. Maybe there are existing implementations out there that already use acls key to store some custom info so the claim name in that case is taken. For example:

{
  ...
  "sub": "userid-123",
  "acls": [
    { "resource": "doc-1248969245", "actions": "read,edit", "permission": "granted"}
  }
}

Or maybe the particular tooling only allows custom claims under extensions claim. For example:

{
  ...
  "sub": "userid-123",
  "extensions": {
    "acls": [
      ["cluster_a:topic_b:write|describe", "*:topic_*:all"]
    ]
  }
}

For groups and principal extraction we already support JsonPath queries, so it could be similar here. Something like: strimzi.acl.authorizer.claim="$.extensions.acls" or strimzi.acl.authorizer.claim="$['extensions']['acls']".

cthtrifork commented 6 months ago

If the JWT expiration is set to the same as for Keycloak, it should be comparable performance-wise. On the other hand, one could decide they don't need such quick propagation and set the expiration to hours or days.

Keycloak authorizer periodically refreshes grants for existing sessions (the interval is configurable). That allows removal of permissions to be detected before the session expires, thus it is decoupled from access token expiry time, and session expiry.

Do you see a way to support this and do you believe it would be critical?

For groups and principal extraction we already support JsonPath queries, so it could be similar here.

A JSON path query for the claim name seems very reasonable as long as the actual data model/contract is static.

Any other thoughts and would you support continuing with this optional extension for supporting generic JWTs ?

mstruk commented 6 months ago

Keycloak authorizer periodically refreshes grants for existing sessions (the interval is configurable). That allows removal of permissions to be detected before the session expires, thus it is decoupled from access token expiry time, and session expiry.

Do you see a way to support this and do you believe it would be critical?

I don't think this is critical. I just wanted to clarify how exactly Keycloak authorizer does it.

Any other thoughts and would you support continuing with this optional extension for supporting generic JWTs ?

This sounds quite usable. How much usage it actually gets is a matter of how it can integrate with existing SSO / Directory Services solutions. If we can really make it elegant to use with Azure Active Directory and hopefully other cloud providers' solutions that can be a very useful feature.

I don't know, what do other maintainers think? @scholzj @ppatierno @tombentley

scholzj commented 6 months ago

I raised my concerns before. I think this should definitely have a proposal before accepting any implementation.

iblutrifork commented 6 months ago

If you believe we don't need to discuss further here, I will move on to writing a proposal.

scholzj commented 6 months ago

Well, we can keep discussing things here - I do not necessarily want to block the discussion here. But in the end, the proposal is what helps to drive the discussion, make it a bit more structured as the proposal can be updated based on the comments, and hopefully make sure it is not just you, me, and Marko discussing it.