opendatahub-io / kubeflow

Machine Learning Toolkit for Kubernetes
Apache License 2.0
10 stars 35 forks source link

Notebook server login is failing after login in with a not allowed user #109

Open harshad16 opened 1 year ago

harshad16 commented 1 year ago

Description of problem:

Notebook server login is failing after login in with a not allowed user

Prerequisites (if any, like setup, operators/versions):

1.28

Steps to Reproduce Login in Red Hat OpenShift Data Science with ldap-admin2 Start a notebook server pen in a new tab Login with ldap-user2 The message: 403 Permission Denied Hit login link Login with OpenShift Login with ldap-admin2 Actual results:

It is returning to the login page

Expected results:

Logged in with ldap-admin2

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

1.28

Workaround: Close the tab. In Red Hat OpenShift Data Science, hit "Access notebook server" Login with ldap-admin2 Additional info:

JIRA: https://issues.redhat.com/browse/RHODS-9425

atheo89 commented 1 year ago

The issue still persists after updating the version to 4.10.

Related PR for the oauth upgrade: https://github.com/red-hat-data-services/odh-manifests/pull/395

It seems that we need to dive deeper and conduct a more thorough investigation in order to identify the root cause of the problem.

harshad16 commented 1 year ago

Moved this to next sprint 1.31.

atheo89 commented 1 year ago

Regarding the unauthorized user, to grant permissions, use the following command:

oc -n rhods-notebooks adm policy add-role-to-user view ldap-user2

It appears that when the oauth-proxy encounters the 403 Permission denied screen, it does not redirect to the correct path for the login screen.

The 'Health' path should look like this:

https://oauth-openshift.apps.atheo1.7ojc.s1.devshift.org/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Arhods-notebooks%3Ajupyter-nb-ldap-2dadmin1&redirect_uri=https%3A%2F%2Fjupyter-nb-ldap-2dadmin1-rhods-notebooks.apps.atheo1.7ojc.s1.devshift.org%2Foauth%2Fcallback&response_type=code&scope=user%3Ainfo+user%3Acheck-access&state=636a81e30acc3f4ab95fad454ccd4cee%3A%2Fnotebook%2Frhods-notebooks%2Fjupyter-nb-ldap-2dadmin1

However, the path that is being redirected by the 403 Permission denied screen seems to be incorrect, causing a login/logout loop. The incorrect path is:

https://oauth-openshift.apps.atheo1.7ojc.s1.devshift.org/oauth/authorize?approval_prompt=force&client_id=system%3Aserviceaccount%3Arhods-notebooks%3Ajupyter-nb-ldap-2dadmin1&redirect_uri=https%3A%2F%2Fjupyter-nb-ldap-2dadmin1-rhods-notebooks.apps.atheo1.7ojc.s1.devshift.org%2Foauth%2Fcallback&response_type=code&scope=user%3Ainfo+user%3Acheck-access&state=2a9655e87d5d3c6d0689bf65a573094c%3A%2Foauth%2Fstart%3Frd%3D%252F

As you can see, the last part of the URL is missing the resource type, namespace, and the service account notebook%2Frhods-notebooks%2Fjupyter-nb-ldap-2dadmin1 at the end.

A workaround for this issue is to close that tab and try accessing it with user2 via the notebook launcher page.

atheo89 commented 1 year ago

After conducting further investigation into this issue, it has come to light that the notebook-controller is redirecting to the wrong login URL due to the OAuth providing all the parameters.

For a clearer understanding, please refer to the screenshots and the URLs provided below.

This is the URL before we provide a non-auth user which eventually will direct to the 403 screen. If you see carefully the oauth provides all the info that is needed (Brake it down for better understanding)

https://oauth-openshift.apps.atheo1.7ojc.s1.devshift.org/login/ldap-provider-qe

?then=/oauth/authorize?approval_prompt=force

&client_id=system%3Aserviceaccount%3Arhods-notebooks%3Ajupyter-nb-ldap-2dadmin1

&idp=ldap-provider-qe

&redirect_uri=https%3A%2F%2Fjupyter-nb-ldap-2dadmin1-rhods-notebooks.apps.atheo1.7ojc.s1.devshift.org%2Foauth%2Fcallback

&response_type=code

&scope=user%3Ainfo+user%3Acheck-access

&state=29d40e2ec1e3ef4a1be84ccc0121546e%3A%2Fnotebook%2Frhods-notebooks%2Fjupyter-nb-ldap-2dadmin1 

Now, when you provide the non-auth user, we get the 403 screen (Which is expected)

As you can see on the mouse hover the redirect URL points to the endpoint: /oauth/sign_in, where it brings us into an infinite login loop.

Screenshot from 2023-07-20 18-07-08

If you change this path /oauth/sign_in to /notebook/rhods-notebooks/jupyter-nb-ldap-2dadmin1 and then press login it works as should be.

Screenshot from 2023-07-20 18-07-49

You can log in on the notebook

Screenshot from 2023-07-20 18-09-55

So, the missing point here is how the notebook-controler manages the received responses from the oauth-proxy.

atheo89 commented 1 year ago

@VaishnaviHire, your support is vital in this issue. Can you please review the investigation that has been conducted so far and provide insight into how the notebook-controller handles responses from the oauth-proxy? It appears that there might be an issue with redirects, as they seem to be directed to the wrong API endpoint.

atheo89 commented 1 year ago

/needs-info

atheo89 commented 1 year ago

This PR solves partially this workload (The user doesn't stay stuck on the login screen, but he redirects on the route of the notebook) -> https://github.com/opendatahub-io/kubeflow/pull/149 Since of the construction of the route is missing the /notebook/${nb_namespace}/${nb_name} suffix, remaining the redirection workload -> https://github.com/opendatahub-io/kubeflow/issues/150

harshad16 commented 1 year ago

With the merge of PR #149 , it is included in the release v1.7.0-2 An additional #150 issue is created and it would be used for further work. Closing this issue, as the issue of cyclic-login issue is fixed.

harshad16 commented 1 year ago

Re-Opening this issue, as the change were impacting the long-running notebooks after the upgrade.

the change injected by odh-notebook-controller to the Notebook CR, would result in restarting of Notebook pod, as the reconciler would update the stateful set with changes(removing of skip-provider-button).

We would like to implement this by checking if the changes would impact long-running notebooks.

reverting the changes from: https://github.com/red-hat-data-services/kubeflow/pull/15 https://github.com/red-hat-data-services/kubeflow/pull/13

atheo89 commented 1 year ago

Adding here the blocker issue Long Running Notebook Testing Support. There is also a relation to this -> https://github.com/opendatahub-io/kubeflow/issues/150)