Open ZxMYS opened 3 years ago
+1. I directly encountered this problem as well and found it to be incredibly confusing and something very specific to Kubeflow notebooks. It appeared like my notebook was broken, but after a full refresh it redirected me to the dex page where it became apparent this was an auth cookie problem.
@kubeflow/wg-notebooks-leads /area notebooks /priority p2
Thank you for raising this issue @ZxMYS!
I think a good first step here would be to extend the docs in kubeflow.org to mention this error. It could be a page where we document errors like these, that could be very confusing when encountered in the wild.
cc @shannonbradshaw
@kimwnasptd probably the best bet is to set the default session timeout to 12+ hours, to reduce the likelihood of people encountering it.
I literally cannot think of a way to fix this, because any HTTP call with an expired session will be redirected to the auth provider (usually dex) by Istio, and because Jupyter is making requests in the background, those requests WILL be redirected, leading to this error.
increasing the default session timeout to 12+ hours seems like a reasonable initial mitigation to reduce the frequency at which this occurs.
I wonder if a JupyterLab plugin specific to Kubeflow could make sense as a pattern to solve this (at least for JupyterLab-based runtimes)? this minimal plugin could run in the front-end application layer (in the browser) and periodically poll the backend server to detect expired auth via a sudden switch from HTTP 200 -> HTTP 302 status codes - then if expired, inform the user via a modal prompt w/ link and/or redirect to the auth provider to re-auth.
I wonder if a JupyterLab plugin specific to Kubeflow could make sense as a pattern to solve this (at least for JupyterLab-based runtimes)? this minimal plugin could run in the front-end application layer (in the browser) and periodically poll the backend server to detect expired auth via a sudden switch from HTTP 200 -> HTTP 302 status codes - then if expired, inform the user via a modal prompt w/ link and/or redirect to the auth provider to re-auth.
@kwlzn that is an interesting idea, would you have experience to create such a plugin?
@thesuperzapper yeah, my group at Twitter has built jupyterlab plugins that can refresh the UI like this for things like dynamically changing the ContextManager at runtime etc. so, I think it should be possible - and the UI layer seems like the right place to do this checking.
I'll see if I can motivate someone to pick this up as a contrib.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I wonder if a JupyterLab plugin specific to Kubeflow could make sense as a pattern to solve this (at least for JupyterLab-based runtimes)? this minimal plugin could run in the front-end application layer (in the browser) and periodically poll the backend server to detect expired auth via a sudden switch from HTTP 200 -> HTTP 302 status codes - then if expired, inform the user via a modal prompt w/ link and/or redirect to the auth provider to re-auth.
@kwlzn that is an interesting idea, would you have experience to create such a plugin?
I'm building a plugin for Jupyterlab to do exactly this. The errors users in my org is slightly different. The kernel becomes disconnected, you cannot save, and in the developer console there is CORs errors related to the 302 status. The identity provider that is used does not allow AJAX requests to get authenticated and it requires the user open a page on their own to auth. The solution of having a button which opens up a pop up and then having the user auth is the solution. JupyterLab these days is quite good at reconnecting once the 3xx response codes + CORs errors stop.
If others are interested I can make this public.
If others are interested I can make this public.
@vinayan3 that would be awesome!
@vinayan3 +1 definitely interested!
I don't know what underlying restrictions there might be, but would it not be possible for the auth token to be regenerated through the use of a refresh token (This is the normal way of dealing with this problem). This would also make the system more secure, as the auth token could then have a much shorter life span, perhaps 5 minutes. When the token has 1 minutes life left, a new token would be requested using the refresh token. This way the user would never be logged out unless they specifically choose to log out and if an auth token was accidently exposed, it would only be valid for a maximum of 5 minutes.
/transfer manifests
With the refresh cookie in oauth2-proxy/dex this is already mitigated a lot. I tested it on 1.9.1 with oauth2-proxy only, but someone needs to provide the dex refresh settings as well here.
Is anyone willing to create a PR?
/lifecycle stale
Hi!
We are using kubeflow 1.3 and are running notebooks with it. It seems like after the authservice_session cookie expires, all requests to a running kubeflow notebook will be redirected (HTTP 302) to the kubeflow login page; This behavior is fine and natural when a user tries to open a page, but for a user who has already opened a jupyter notebook page and is using it, it is less so: the user will see this error dialog with a confusing error message when they do most actions (save a notebook, create a notebook, open a terminal, etc) on the notebook page:
This error message is due to that Jupyter frontend code expects a JSON response to those requests. Since kubeflow redirects the requests to the login page, which is HTML, the notebook frontend can not parse the response properly.
Given that the authservice_session cookie seems to be valid for a day only, it's not uncommon for a notebook user who works on notebooks continuously to hit this issue.
I wonder if kubeflow can provide a better user experience here - e.g. instead of blindly redirecting all requests to the login page, only redirect the index page of a notebook server and return 403 for other requests. The Jupyter frontend can properly handle the 403 and display a proper error message, which is much less confusing.
To reproduce the error message in the screenshot: