Open Kidswiss opened 2 days ago
Can confirm the same issue with Zitadel 2.62.1 and Netbird 0.29.3 Additional logs from netbird-management container: ERRO [requestID: 098374cd-f244-4be6-91f4-9b3e02fb292f, context: HTTP] management/server/http/util/util.go:81: got a handler error: token invalid ERRO [context: HTTP, requestID: 098374cd-f244-4be6-91f4-9b3e02fb292f] management/server/http/middleware/auth_middleware.go:89: Error when validating JWT claims: unable to post https://bla.blabla.com/management/v1/users/_search, statusCode 403
The logs in the Zitadel container are identical like above.
It worked before months and several version (combinations) of Netbird and Zitadel. I am usually quite fast with updates and had no issues so far until the last update of Netbird and Zitadel. So I guess something has changed either in Netbird or Zitadel in the last 1-2 releases which is the root cause of this issue.
I see that Zitadel released v2.62.1 two days ago, but they have now marked v2.59.3 as the latest version. Could you try using v2.59.3 (latest) for now or rollback to the previous version that was working for you?
In meantime we will run tests to confirm the breaking changes and update the NetBird Zitadel implementation accordingly.
I see that Zitadel released v2.62.1 two days ago, but they have now marked v2.59.3 as the latest version. Could you try using v2.59.3 (latest) for now or rollback to the previous version that was working for you?
In meantime we will run tests to confirm the breaking changes and update the NetBird Zitadel implementation accordingly.
This is for sure some mistake by Zitadel tagging this version 2.59.3 as "latest". See https://github.com/zitadel/zitadel/releases They have several versions updated in the last days with all these three bug fixes mentioned (from 2.54.x to 2.62.x).
I just wanted to follow up with both a "me too" and some info from the zitadel side. the events history does say a token was created and authenticated properly for me. so it appears to be some kind of permission issue just with the netbird user accessing that endpoint.
This was all working previously for many months.
I have some experience writing integrations with zitadel, I'll poke around to see what netbird is calling vs. what the api is expecting.
edit:
I added some extra logging and error response parsing into the management server and zitadel is responding with:
failed warming up cache due to error: zitadel error code: 7 message: could not read projectid by clientid (AUTH-GHpw2)
will continue poking around
edit2:
so it looks like the client id we're using to authenticate "netbird" by the docs, + the client secret are getting encoded into the JWT returned from zitadel. and we're using that client id "netbird" to make requests.
zitadel on the on the otherhand is doing some work to verify the access token and they're looking up the client_id from the access token we pass in. they're looking up that client_id in the registered apps list to see which app and project it should belong to. but "netbird" isn't the client id of the app, it's 234872394...@netbird
.
however if we use that client id to perform the management query, they're logging this error:
oidc_error.parent="ID=QUERY-Dfbg2 Message=Errors.User.NotFound Parent=(sql: no rows in result set)" oidc_error.description="client not found" oidc_error.type=invalid_client status_code=400
there's definitely some confusion happening on what credentials should be used
another follow-up:
I added a PAT for the netbird user and made changes to the management service overloading the ClientSecret and Authenticate method to just make a pretend JWT with the AccessToken being the PAT to use that instead of authenticating a JWT and everything seems to be working fine this way since it just concatenates Bearer + accessToken
to assemble it before a request is made.
I think it would be a relatively simple change to just use a PAT and refactor the config a bit if we want to swerve this issue. I'll keep tweaking configurations and hacking on both sides to see if I can find the real cause though.
In the meantime at least my management service is back online :)
more extra data:
I added support in netbird for using the Bearer "Access Token Type" instead of JWT from zitadel as well, and get the same could not read projectid by clientid
error as before. So it's not to do with receiving and passing the jwt access token.
I also tried adding the urn:zitadel:iam:org:project:id:{projectid}:aud
scope to the scopes when making the access token request as noted here: https://zitadel.com/docs/guides/integrate/service-users/client-credentials#2-authenticating-a-service-user-and-request-a-token but that also didn't make a difference.
I'm getting another chance to look at this today and at this point I'm pretty sure there's some undesired behaviour going on the zitadel side here. I've followed all of the specs A-Z to build this token for a service user from their docs and their examples, but none of them will authenticate.
I think it may have been introduced in a big refactor on their side at 8e0c8393
. If I make a small change to the auth flow in zitadel and not assume any client_id request a project request, only checking clientid against projectid when it's of the id>@<org format and continuing on otherwise with the rest of the auth flow, everything works again. I'm going to open up an issue on the zitadel side and see if I can learn some more there.
edit: though there's not much talk on their github issues list about this, I found some folks complaining in discord about service accounts not working with the same error.
https://github.com/zitadel/terraform-provider-zitadel/issues/199 I'm seeing the issue pop up in some other places as well. linking for posterity.
Describe the problem
When updating Zitadel to 2.61.2 or anything newer, then Netbird can't query the Zitadel user endpoint anymore.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Zitadel integration should still work if it gets updated.
Are you using NetBird Cloud?
Selfhosted
NetBird version
0.29.3
Additional context
Add any other context about the problem here.
Netbird management logs
Zitadel log entries:
I've tried re-creating the service account secret, but the error persisted. Also, not sure if this is an issue on Zitadel's side or on Netbird. But given that Netbird is the only app I had issues with, I opened a bug here.