token exchange: exchange-sequence fails with Client session for client 'client-exchanger' not present in user session

flyinfish commented 1 week ago

Before reporting an issue

[X] I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

Area

token-exchange

Describe the bug

when i try to do a token-exchange in sequence it fails on the second exchange: [org.keycloak.services.managers.AuthenticationManager] Client session for client '..' not present in user session '..'

still not shure if it's a plain "bug" or if i just miss something. i created https://stackoverflow.com/questions/78602573/keycloak-token-exchange-sequence with no input so far.

there is a discussion to from https://github.com/keycloak/keycloak/discussions/17440#discussioncomment-9760955 from @ChrisBrandhorst which seems to hit the same issue

Version

main

Regression

[ ] The issue is a regression

Expected behavior

token exchange sequence should work for as many clients/audiences you like

Actual behavior

the second exchange fails when initial token was received by client_credentials grant od public-client

How to Reproduce?

https://github.com/keycloak/keycloak/pull/30615 org.keycloak.testsuite.oauth.ClientTokenExchangeTest

testExchangeSequenceStartedWithGrantTypePassword works
testExchangeSequenceStartedWithGrantTypeClientCredentials fails
testExchangeSequenceStartedWithGrantTypePasswordOnPublicClient fails

https://github.com/flyinfish/keycloak-examples/blob/main/token-exchange-sequence/README.md

Anything else?

No response

flyinfish commented 1 week ago

opened https://github.com/keycloak/keycloak/pull/30615 which adds test to reproduce the behaviour

keycloak-github-bot[bot] commented 1 week ago

Due to the amount of issues reported by the community we are not able to prioritise resolving this issue at the moment.

If you are affected by this issue, upvote it by adding a :thumbsup: to the description. We would also welcome a contribution to fix the issue.

mposolda commented 1 week ago

Thanks for the info. I am triaging it as lower priority as it is not clear that this is bug at all and it is related to token-exchange, which is preview feature.

rmartinc commented 1 week ago

I have been checking this problem a bit and the root cause is that the client session is created for the target client but the in the token the azp (issuedFor) is set to the requester client. This way when the client session is checked in the auth manager it is detected as nonexistent (here). The reason is the session was created for target client but in the azp is the requester client (and therefore it's not found).

@pedroigor Why the azp is set to the requester client? Should not be the target client? In the end the token is issued for the target client although initiated by the requester... I did a quick change in this branch and I can exchange three times with no issue. But there are tests that are testing the azp is the requester client. So I'm asking you... :smile:

flyinfish commented 1 week ago

Thanks @rmartinc for exploring. I agree that the azp is somehow the key to the solution.

The biggest question for us was if this is such an edge case that it does not work. Would it not be the normal case testExchangeSequenceStartedWithGrantTypePasswordOnPublicClient starting with a token from a SPA/public client and then exchange it from microservice to microservice as deep as you want?

im out of office for 2 weeks now so sorry when i might not respond to further discussions immediately.

pedroigor commented 1 week ago

The use case is valid and we need to make it work. It should not matter how much "deep" it is when chaining services and executing token exchanges.

IMO, the azp should reference the client to which the token is being issued for. The fact the session is created for the target client is a limitation in the sense that we (internally) create tokens within the scope of a client. When building a token using the token-exchange grant type, the client we need to operate is the target client because we'll be using the client scopes, mappers, etc, associated with this client.

I think we can change the validation logic to consider "token exchange tokens". In this case, the validation would take into account that the token, even though issued to azp, maps to a session (and a token) in the target client.

Wdyt?

J0F3 commented 1 week ago

IMO, the azp should reference the client to which the token is being issued for.

I agree. The value of azp makes sense for me.

When building a token using the token-exchange grant type, the client we need to operate is the target client because we'll be using the client scopes, mappers, etc, associated with this client.

That makes also sense for me.

What I do not understand yet is why it works when the first initial token was issued to a secure client with grant_type=password where the "chain" of client is actually the same. (client1 -> client2 -> client3)

I assume it has something to do that in this case a user session is created right from the beginning for client1. Vs. with the other grant types where a user session is only created for client2 when the first token exchange is made. But for client1 there is actually no session at all. Is that the difference which makes the validation to fail when the token exchange is made the second time (client2 tries to exchange the token to token for accessing client3)?

J0F3 commented 5 days ago

@rmartinc @pedroigor I did some also some debugging to see what actually happens during the token exchange and when the error accours. So my conclusion for now is the following:

Why the azp is set to the requester client? Should not be the target client? In the end the token is issued for the target client although initiated by the requester..

The azp is correctly set to the requester client because he is the actual holder of the token which allows him then to call the traget client.

But what is the root of the problem IMHO is that the session is created for the target client instead of the requester client which is the actual client which gets the token and is therefore in the azp claim. This lead then to the problem that the requester client is not in the session when target client tries to exchange the token again. So I am wondering if session can not be created for the requester client instead so that the client which is actually doing the token exchange has actually also a session. But I assume that is what @pedroigor means by "limitation"?:

The fact the session is created for the target client is a limitation in the sense that we (internally) create tokens within the scope of a client.

I think we can change the validation logic to consider "token exchange tokens". In this case, the validation would take into account that the token, even though issued to azp, maps to a session (and a token) in the target client.

To change the validation logic would probably also help (@rmartinc I see you have already created a PR which such a change) but I think the session handling looks then still a bit strange. Because when you look at the Session in the Admin UI for example you will never see that the requester client is part of the session. Which makes it very confusing in my option. I think the requester client of the token exchange should be visible in the session because he gets and can use the exchanged token.

To make it hopefully mor clear what I actually mean here some more details of one the scenario which @flyinfish and I are trying to achieve:

We have the following setup: Clients

public-spa (public client)
service-1-client
service-2-client
service-3-client

Flow:

User 'grant' logins to SPA (using public-spa client)
SPA calls Service1 with User Access Token
Assuming Service1 has to call Service2. Service1 now uses its client (service-1-client) to exchange the Access Token for an Access Token valid to call Service2
Service1 then calls Service2 with the exchanged token
Servie2 now wants to call Service3. So it uses its client (service-2-client) to exchange the Token again. (here is where the error happens)
Service3 then calls Service3 and so on...

Now when looking at the Session for User "grant" the client "service-1-client" is never shown as part of the session:

So I think this is also very confusing and makes IMHO not really sense. This is way I think it would be better to change it so that a session for "service-1-client" (the requester client of the very first token exchange) is actually created. Then also the validation logic would work and one cloud actually also see all clients involved in the User Session.

What do you think about that? Is it even possibile to change the session from target client to the requester client?

rmartinc commented 4 days ago

@J0F3 What I'm trying in my PR is just setting in the user session that it is a Token Exchange session and it's really for the target client. As any TE request creates a new session, the requester calls TE to get a target token, the session is created with a note that indicates is for target (and not for the requester). You can try with my PR to see if it works for you.

J0F3 commented 4 days ago

@rmartinc Yes I will try with your PR. Thank you.

the session is created with a note that indicates is for target (and not for the requester) I got that. But I think this is at least part of the problem and the validation is actually ok as it is.

Why should the session not be created for the requester? I mean when the User logs in to a SPA using the session is also created for public-spa client event though the audience of the issues Token is a different client. I think the same should be made also for the token exchange. Because the requester client is the client for which the token is issued and not the target client. The target client is just the audience of the token but the target client does actually not hold the token. Thus the session for the target client makes in my opinion no sense here. Or is that some sort of technical limitation and it is simply not possible to set the session for the requester client during the Token Exchange?

I mean what would be if for example at https://github.com/keycloak/keycloak/blob/main/services/src/main/java/org/keycloak/protocol/oidc/DefaultTokenExchangeProvider.java#L403 targetClient would be changed to "requester client". Would then the whole Token Exchange not work anymore?

rmartinc commented 4 days ago

@J0F3 I'll give it a try, I don't know in advance if it's complicated or not.

J0F3 commented 4 days ago

@rmartinc I am trying to test your PR. But how can I start the Keycloak server correctly so the Token Exchange Feature is actually active? When I start the Keycloak as described here: https://github.com/keycloak/keycloak/blob/main/docs/building.md#starting-keycloak the Token Exchange feature is disabled. Event when I start it with java -jar quarkus/server/target/lib/quarkus-run.jar start-dev --features=preview the Token Exchange Feature is still disabled. Thanks and sorry for the rookie question! 😄

rmartinc commented 1 day ago

Checking if I could change the session I have realized that we can fix this much more easily. The problem is a bit different to what we have explained in the previous comments. The root cause of the reproducer is that the initial session is created with a client_credentials request, which uses a transient session (by default Use refresh tokens for client credentials grant is false). But after that, the first TE request creates a persistent session which obviously lacks the session for the initial client. If teh client is configured to generate refresh tokens for client credentials the TE exchange works, because the session is persistent and the initial client has a session.

I think that if the initial session is transient (like client_credentials) TE should not create a persistent session but another transient session. That way the session is always transient (limited to the token with no session at server side).

J0F3 commented 23 hours ago

I think that if the initial session is transient (like client_credentials) TE should not create a persistent session but another transient session. That way the session is always transient (limited to the token with no session at server side).

Ah, yes that sound reasonable for me. 👍🏻

Do we have actually two slightly different problems then? One your just described when the initially client_credentials is used. And an other a bit different one when initially a user as logged in by using a public client and authorization_code for example. Or would that solve the problem for both cases?

We have actually tested both cases (where client_credentials is used initally and where a user logs in with authorization_code flow) and hat in both cases the same error. But maybe the cause is not the same for both cases?

rmartinc commented 22 hours ago

@J0F3 I think that yes, both cases are tested. In the reproducer provided if you just change the client_credentials to have persistent session the reproducer does not fail. It's also a workaround for the moment. The idea is, if the session is persistent, the initial client has a session created (because the token is valid and the client session should be there). And if the session is transient (like with client_credentials default behavior) it also works the first time (because everything is set correctly as transient). The problem was in after the first TE because the session was now persistent but the first client had no session in it as it was initially transient.

J0F3 commented 15 hours ago

@rmartinc

If teh client is configured to generate refresh tokens for client credentials the TE exchange works

Just to be sure. You a referencing to the following setting of the client right?:

I tested it briefly and I think I can confirm that when this setting is enabled on the clients the TK exchange works as expected when using client credentials. But with a public client initially and 'password' or 'authorization_code' grant I get still the same error after the first exchange.

In the Admin UI there can also now seen a session which makes sense for when using the "client credential" flow and where all clients are part of it. (service-1-client is the client used to get the initial token with "client credentials")

But for login with a public client and with "password" or "authorization_code" flow the session is still missing the service-1-client. Which then still lead to the error Client session for client 'service-1-client' not present in user session '...'

So it seems the persistent session helps when using "client credentials" grant. (service to service communication) but not when a an actual user session exists when using "password" or "authorization_code" grant initially.

But when I understand you correctly, with your PR the situation would be different because the TE does not create persistent session anymore. Which is not quite the same as when enabling the "Use refresh tokens for client credentials grant" setting, right?

rmartinc commented 15 hours ago

Try to test with my PR. There can be still some corner case because your scenario is even more complicated.

keycloak / keycloak