open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
4.89k stars 939 forks source link

Non-existing user entry with SAML SSO leads to login loop (should be fixed in 1.4.0 but isn't) #16433

Closed xenji closed 1 month ago

xenji commented 1 month ago

Unfortunately, the bug described in #16076 is still present in 1.4.0. I can provide more details, as offered in the referenced issue.

Sachin-chaurasiya commented 1 month ago

Hello @xenji, thanks for filing the issue, can you share more details

xenji commented 1 month ago

Version response from the API

Guessing you mean GET /v1/system/version?

{
  "version": "1.4.0",
  "revision": "8f9d9e414881d6fcfa447b22c4a594ebe784f26e",
  "timestamp": 1716447137419
}

How did you configure the SAML SSO

I replaced some parts with {...} for security reasons.

authentication:    
      enabled: true
      clientType: public
      provider: "saml"
      saml:
        debugMode: true
        idp:
          entityId: "http://www.okta.com/{...}"
          ssoLoginUrl: "https://{....}.okta.com/app/veevasys_linkdatacatalog_1/{....}/sso/saml"
          idpX509Certificate:
            secretRef: "openmetadata-okta-saml-secret"
            secretKey: "idpX509Certificate"
          authorityUrl: "https://catalog.{...}/api/v1/saml/login"
          nameId: "urn:oasis:names:tc:SAML:2.0:nameid-format:emailAddress"
        sp:
          entityId: "https://catalog.{...}/api/v1/saml/metadata"
          acs: "https://catalog.{...}/api/v1/saml/acs"
          callback: "https://catalog.{...}/saml/callback"

How did you deploy the OpenMetadata

xenji commented 1 month ago

@Sachin-chaurasiya Anything else you need?

Sachin-chaurasiya commented 1 month ago

@Sachin-chaurasiya Anything else you need?

Thanks, @xenji , I will check and get back to you.

Sachin-chaurasiya commented 1 month ago

Hello @xenji , it seems you have configured the wrong endpoint for sp.entityId

sp:
          entityId: "https://catalog.{...}/api/v1/saml/metadata"

Can you try with

sp:
          entityId: "https://catalog.{...}/api/v1/saml/acs"
xenji commented 1 month ago

Sure, I can do that. For reference, I followed the documentation here: https://docs.open-metadata.org/v1.4.x/deployment/security/saml

image

But, to be frank, from a logical standpoint, using the ACS makes little sense to me.

Given the definition of the entity ID in the SP section reads as follows:

The (SP) entity ID is a URL where a service provider publishes public information about its SAML configuration. The metadata document published by the service provider shows its public certificate that can be used to verify the signature of authentication requests initiated from the service itself.

Anyway, it won't explain why it works if I disable the browser cache via the developer console.

xenji commented 1 month ago

@Sachin-chaurasiya I feel this is more poking in the dark than really debugging it. I provided some basic context in the referenced issue that seems to be ignored. Now, you tell me that I used your own documentation wrong. This leaves the impression that it's more important to prove me wrong rather than focusing on the issue.

I offered my sincere help a couple of times now. If someone can point me to the areas of the code where this is handled, I'm happy to help and debug it myself. Unfortunately, I don't have the time to work myself completely into a new code base on my own. SAML authentication is crucial for me. If that does not work out, we need to switch to a different catalog for our POC.

Sachin-chaurasiya commented 1 month ago

Hi @xenji

Thank you for your patience and detailed feedback. I understand your concerns and appreciate your willingness to help debug this issue.

Given the definition of the entity ID, your point about the ACS endpoint makes sense. I apologize for any confusion caused. Let's focus on resolving the issue together.

We have fixed the issue in version 1.4 (see PR #15854) and are trying to help you apply the same fix. However, it seems there is an issue with the configuration, which is why we pointed it out. We will also update our documentation accordingly.

Could you please share more details about the specific problem you are encountering, especially the part related to disabling the browser cache? This might help us pinpoint the issue more effectively.

We also have community office hours every Thursday. This would be a perfect time for you to join and ask any questions you might have. We’re more than happy to help.

Your collaboration is greatly valued in making SAML authentication work seamlessly for your POC.

xenji commented 1 month ago

If I read the referenced PR correctly, it does explicitly not target SAML, but all other IdPs.

For clarity: Yes, we are using Okta, but we use it via SAML not via Okta integration - for reasons beyond my influence.

Sachin-chaurasiya commented 1 month ago

If I read the referenced PR correctly, it does explicitly not target SAML, but all other IdPs.

For clarity: Yes, we are using Okta, but we use it via SAML not via Okta integration - for reasons beyond my influence.

@xenji , did changing the sp.entityId work for you?

sp:
          entityId: "https://catalog.{...}/api/v1/saml/acs"
xenji commented 1 month ago

@Sachin-chaurasiya seriously, this makes no sense at all. The ACS is a POST-only URL, but the entityID URL is used to retrieve metadata that is used for the JWT token generation, like the keyID, via GET. That alone won't work. My previous comments clearly pointed that out. I'm very puzzled that you still want me to pursue that. It appears to me that you continue to poke in the dark. You even ignored my comment about your referenced change, which targeted everything except the SAML authentication.

And as expected, the server does not start, because it cannot generate the token. See logs below.

│ openmetadata ERROR [2024-06-03 13:46:57,992] [dw-97 - GET /api/v1/services/databaseServices/name/Redshift%20Main] o.o.s.e.CatalogGenericExceptionMapper - Error handling a request: 9d9f385ef4b2d1ab                                                                                                                       │
│ openmetadata org.openmetadata.service.exception.UnhandledServerException: An exception with message [com.auth0.jwk.SigningKeyNotFoundException: JWT Token keyID doesn't match the configured keyID. This usually happens if you didn't configure proper publicKeyUrls under authentication configuration.] was thrown whil │
│ openmetadata     at org.openmetadata.service.security.MultiUrlJwkProvider.get(MultiUrlJwkProvider.java:64)                                                                                                                                                                                                                 │
│ openmetadata     at org.openmetadata.service.security.JwtFilter.validateAndReturnDecodedJwtToken(JwtFilter.java:200)                                                                                                                                                                                                       │
│ openmetadata     at org.openmetadata.service.security.JwtFilter.filter(JwtFilter.java:143)                                                                                                                                                                                                                                 │
│ openmetadata     at org.glassfish.jersey.server.ContainerFilteringStage.apply(ContainerFilteringStage.java:108)                                                                                                                                                                                                            │
│ openmetadata
xenji commented 1 month ago

Could you please share more details about the specific problem you are encountering, especially the part related to disabling the browser cache? This might help us pinpoint the issue more effectively.

I've already shared information in #16076 about the issue. What exactly do you want to know?

Sachin-chaurasiya commented 1 month ago

Hello @xenji,

After our investigation, we've observed an anomaly that could potentially be causing the issue. Could you please verify if you're encountering two API calls for /users/loggedInUser when redirected back to the application from the authentication provider?

Your confirmation would be greatly appreciated.

Sachin-chaurasiya commented 1 month ago

@xenji , the fix will be available as part of 1.4.2 release.

xenji commented 1 month ago

Hello @xenji,

After our investigation, we've observed an anomaly that could potentially be causing the issue. Could you please verify if you're encountering two API calls for /users/loggedInUser when redirected back to the application from the authentication provider?

Your confirmation would be greatly appreciated.

Sure. I need one of my team members for it. I get back to you as soon as possible. Likely by the end of tomorrow in CEST.

xenji commented 1 month ago

I can confirm two calls with an account present. As well as two permission calls. image

For logins without user accounts present, we see two calls to the endpoint with 404 responses.

Sachin-chaurasiya commented 1 month ago

I can confirm two calls with an account present. As well as two permission calls. image

For logins without user accounts present, we see two calls to the endpoint with 404 responses.

Thanks, @xenji , I believe this is for existing users, can you please check when you try to log in with a new user?

Sachin-chaurasiya commented 1 month ago

@xenji , the fix will be available as part of 1.4.2 release.

@xenji https://github.com/open-metadata/OpenMetadata/pull/16543

xenji commented 1 month ago

For logins without user accounts present, we see two calls to the endpoint with 404 responses.

I've mentioned that in the last sentence.