Exploring and defining UX expectations for a RTC enabled multi-user environment (like JupyterHub)

consideRatio commented 3 years ago

I've just attended the RTC workshop and hope to scope and validate work that I can contribute with. This issue is my attempt to think aloud for anyone to listen instead of writing my thoughts to just a few people.

I think there is a need to explore and define expectations we have on a RTC enabled UX. I mainly consider a high level UX in a setup where identities already known, for example by JupyterHub, and where users have the ability to start their own personal servers and/or collaborative servers.

Exploratory questions to define UX expectations

What UX requirements do we have for a RTC enabled system in a JupyterHub setting?

Example answer: A user should be create a RTC enabled "named server" (JupyterHub term), and invite other JupyterHub users to work on it in a RTC fashion.
What UX requirements do we have for a RTC enabled system in a standalone setting (not associated with a JupyterHub)?

Example answer: To be able to have a "link sharing" experience where each person arriving would be able to define a local identity.
_What technical components does a RTC enabled collaborative server (jupyterserver) and RTC enabled UI (jupyterlab) need besides that for a good RTC experience for a set of users belonging to a JupyterHub?

Example answer with three parts:
1. It needs a JupyterHub specific implementation of an abstract IdentityProvider base class that is made available on the singleuser-servers
2. It needs a standalone container running alongside the jupyter server to coalesce the jupyter RTC state that is configured in this way...
3. It JupyterHub to be configured to be an OAuth2 provider...

Related links

echarles commented 3 years ago

Awesome! I think our goal with yesterday RTC workshop has paid off: JupyterHub and JupyterLab working even more together. On my side, I have fully booked till end of this week and will feed more content begin next week. Thx again!

minrk commented 3 years ago

A note on the jupyterhub side (I couldn't make the workshop this week, but would have loved to!): with the scopes with in the RBAC branch, I intend for it to be possible for the scopes themselves to be extensible, so that whatever permissions you want/need in RTC can be expressed in jupyterhub's own scopes, returned from the auth process.

Then, when you authenticate a user, you will get what relevant permissions they have, e.g. access user X server Y or edit user X server Y and can use that to apply access control.

It JupyterHub to be configured to be an OAuth2 provider...

Do you mean for jupyter-server to be configured with JupyterHub as its OAuth2 provider? JupyterHub is always an OAuth2 provider, and this is how jupyter-server already authenticates users when running in jupyterhub.

consideRatio commented 3 years ago

Wieee great input about RBAC! I'm excited about your work on that @minrk, @IvanaH8, and @0mar!

Do you mean for jupyter-server to be configured with JupyterHub as its OAuth2 provider? JupyterHub is always an OAuth2 provider, and this is how jupyter-server already authenticates users when running in jupyterhub.

Ah excellent, thanks for clarifying that!

echarles commented 3 years ago

Do you mean for jupyter-server to be configured with JupyterHub as its OAuth2 provider? JupyterHub is always an OAuth2 provider, and this is how jupyter-server already authenticates users when running in jupyterhub.

@minrk Would be great if you could link the relevant code section that does that?

echarles commented 3 years ago

I intend for it to be possible for the scopes themselves to be extensible, so that whatever permissions you want/need in RTC can be expressed in jupyterhub's own scopes, returned from the auth process.

@minrk Is there already an implementation for that extensibility features in the RBAC branch, or is it more a plan / idea ?

echarles commented 3 years ago

To further expand the discussion, OAuth(2) is unfortunately not Identity/Authentication. It is just Authorization. You get back a token that allows you to perform actions, but you don't get back the identity of the user. Search for "oauth is not authentication" eg https://www.scottbrady91.com/OAuth/OAuth-is-Not-Authentication

I would have loved having juptyerhub be a OpenID Connect provider and give you back a well-formed secured JWT token. So having JupyterHub the primer source of Auth is to me something I questioning. Would love feedbacks on that.

minrk commented 3 years ago

An implementation for that extensibility features in the RBAC branch, or is it more a plan / idea ?

A plan in terms of making it configurable. As we are right now, I think you can technically define roles with custom scopes just because we don't currently check if those scopes are actually defined, but I think this will be fixed and custom scopes defined explicitly. As it is right now, the lack of checking means silent failure to grant the intended permissions if you have typos in your scopes, e.g. read:uers instead of read:user would effectively define a useless custom scope instead of raising an informative error.

OAuth(2) is unfortunately not Identity/Authentication. It is just Authorization.

While OAuth as a protocol isn't itself by definition an identity system, it does return a token you can use to look up the owner of the token with ~every oauth provider. This is e.g. the entire basis of oauthenticator. So it's not really true to say that it can't or even shouldn't be used for that, since it can very naturally with GitHub, Facebook, KeyCloak, Auth0, Active Directory, JupyterHub, and every OAuth provider I'm aware of. What it doesn't do is define a standard for how to resolve the identity or its structure, that's one of the things OIDC does, which is why there's some variation based on the OAuth provider for how to resolve an access token to an identity since that's outside the OAuth spec, but (in my experience) it's never complicated or even unavailable.

That article and most like it speak in generalities that I think are unhelpful, especially since they ~all make "In general, you can't assume X" statements where JupyterHub does explicitly guarantee X. For instance:

When an authorization server issues an access token, the intended audience is the protected resource. After all, this is what the token is providing access to.

In the JupyterHub case, the client is the protected resource - that's the scope of the token: access to this client application, so using it to resolve access to the client is super appropriate. That's literally the whole point.

So while OIDC would let you have a single spec for resolving identity with any OIDC implementation, you can support any OAuth provider if you only allow the custom hook for token -> identity model per provider (this is what OAuthenticator does).

Note that just "using someone else's access tokens" to define access is one thing, and what that article and others seem to talk a lot about, but that's not what happens anywhere in JupyterHub. We use access tokens we retrieve ourselves at the completion of the authentication process, and exchange them immediately for an identity of the user who just completed Authentication with Jupyterhub. so statements like "An access token also does not represent or give any indication of a user having authenticated." also just aren't true in our case, and aren't applicable to using OAuth for identity in general.

I would have loved having juptyerhub be a OpenID Connect provider and give you back a well-formed secured JWT token.

I'm open to that! I've had nice experiences with OIDC+JWT recently. The main reason I haven't used JWT for JupyterHub is the issue of revocation, and I haven't seen a satisfactory implementation of JWT revocation before. Short-lifetime tokens lead to super bad experiences for applications like ours, so frequent re-auth and token re-issuing during a single long-lived jupyterlab browser session is not going to work, I think. It should be doable, but I don't see much benefit to justify the work at the moment.

minrk commented 3 years ago

Would be great if you could link the relevant code section that does that?

auth with the hub is implemented here and here is an example of configuring one JupyterHub to use another JupyterHub as a generic OAuth identity provider.

echarles commented 3 years ago

@minrk Thx for elaborating and bringing context.

Will play a bit with the jupyterhub rbac custom scope which are not typed for now.
We agree that OAuth can be used to retrieve identity (in adhoc ways) and that we can stick to that for now (OIDC can be added by someone later on, with user-friendly renewals system, played with that some time ago and perfectly doable).
Thx also for the links to jhub acting as OAuth provider. I was more asking on how jupyter-server already authenticates users when running in jupyterhub. looking at https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/singleuser/mixins.py - My question was how is single-user jupyter server configured/used to enforce the oauth mecanism provided by jupyterhub.

minrk commented 3 years ago

My question was how is single-user jupyter server configured/used to enforce the oauth mecanism provided by jupyterhub.

That's in the file I linked, specifically the check_hub_user method where it takes an authenticated GitHub user and checks against configuration to see if they should be allowed. Arbitrary logic can go here, but the default is to allow only the owner themselves, and, depending on configuration, any admin. When the RBAC work is finished, this will change to check scopes in the authenticated-user model instead of other user properties like name and admin, but the structure will otherwise be the same.

One thing that's going to require thinking in jupyter-server is how to handle different permissions for different handlers. One version would be to have a .required_scopes property and ensure that's set for each handler (maybe even for each method). JupyterHub has some decorators to manage this internally, and maybe we should provide some of the same for the client-side code when we get to it. Those helpers would also go in the same jupyterhub.services.auth

echarles commented 3 years ago

@minrk Talking about jupyter server, how do you see the the RBAC work articulate with initiatives around Authorization like https://github.com/jupyter-server/jupyter_server/pull/165 done by @Zsailer . Do you see the RBAC/Permission model of JupyterHub overriding or adding to the Jupyter Server permission model (assuming something gets merges on server side one day)?

minrk commented 3 years ago

I see JupyterHub scopes as one source of information to be used for that access control. Essentially, the JupyterHub auth adapter will take the user model and authorized scopes and provide an implementation of user_is_authorized that maps JupyterHub scopes onto jupyter server resource permissions. Just like it does today with get_current_user where the only significance to jupyter_server at the moment is its truthiness.

Zach's work is basically the architecture needed in jupyter-server to allow jupyterhub scopes to be used to specify individual server resources, rather than the current all-or-nothing auth.

daN4cat commented 3 years ago

Great work and great discussion, thanks all for the effort you put in this project.

If I may drop my 2p here about the authentication and authorisation point, more from a generic industry standard point of view and approach. Hope you don't mind and not finding too patronising, just trying to give a generic perspective to the matter.

OpenID connect is used to add a proper identity story to OAuth 2, which was just an authorisation story at the time. Plasee read this blog post to understand it better in case you are not too familiar with it.

Most of the modern web applications implement a "thick" JS based application (usually referred as Single Page Application - SPA) running in a web browser and interacting with a web service layer via REST APIs, or GraphQL if you prefer that approach.

When it comes to Auth flow, at User SignIn time the SPA (JS web app running in your web browser) redirects you to an Identity Provider (e.g. Azure AD, AWS Cognito, Auth0, Okta, PingIdentity) to input their user credentials (username + password) in a secure environment (it's running in Identity Provider context) in order to get an Auth code, JWT tokens (ID + access tokens) and a refresh token following OIDC + OAuth 2 protocol. The SPA then uses the JWT access token, which could contain your RBAC role as claim, to access the REST API (or GraphQL) implemented by your web service (server).

Your web service validates the JWT access_token with the Identity Provider before allowing any business logic to be executed, also it may check for any RBAC role the token may contain to allow you to perform any action. The latter part is Authorisation, the role grants you permission to perform an action and the web service need to enforce it.

Why am I making this point?

From security perspective it's really not a good idea to implement an Identity Provider functionality in your service layer (Hub in this case). That will be easily attacked and you will compromise all your environment and user credentials.
Separation of concerns, focus on allowing users to access business logic leveraging a walet key pattern. You don't need to worry about how a user was authenticated as long as you trust the identity provider and check that the JWT tokens are valid.
Removing all the highly security concerns and user identity outside your business logic you worry less about the latest and greatest in that area.

daN4cat commented 3 years ago

WRT a comment I read above, if I read it correctly:

I'm open to that! I've had nice experiences with OIDC+JWT recently. The main reason I haven't used JWT for JupyterHub is the issue of revocation, and I haven't seen a satisfactory implementation of JWT revocation before. Short-lifetime tokens lead to super bad experiences for applications like ours, so frequent re-auth and token re-issuing during a single long-lived jupyterlab browser session is not going to work, I think. It should be doable, but I don't see much benefit to justify the work at the moment.

Hope it's not like I suspect, but please don't build any dependency on JWT token itself to look up a user session and the like. You can use the ID identifying the user as subject claim in your JWT access_token for that. Access_token can be refreshed by the JS application (Single Page Application - SPA) with auth code + refresh token and you don't need to worry about it as long as the session is held against the subject ID. Access_token lifetime is recommended to be no more that 1h for security reasons and needs to be refreshed by SPA constantly.

rkdarst commented 2 years ago

I think that one thing that should be mentioned (not just for hub-related deployments, but pure lab as well) is security implications of this. From my quick read, it seemed that connected users could execute code in the host's kernel. If that's the case, it means that, in effect, everything in that environment is accessible to the person connecting. This has a major effect on the hubs I run: it means that once this is enabled, there is no isolation between projects. Since the single-user server runs in a environment that has access to all the user's data and projects, this is essentially the same as sharing accounts, which is a big "no".

Is the above analysis correct?

So, in short

Document if there is access to run arbitrary code on the host's environment
Clearly define how to determine what that environment is
Also note this in https://jupyterlab.readthedocs.io/en/stable/user/rtc.html

It seems like it would definitely push towards single-project containers for deployment.

Thanks for all your work on RTC!

krassowski commented 2 years ago

Inspired by the above, I come to think that for the RTC to be useful beyond teaching and tutorial settings we need to solve the problem of having user-specific settings for extensions and server extensions; for example:

[ ] how can we enable multiple users to collaborate on code when jupyterlab-git extension is installed?
- guest users who are granted access to the notebook should not see the private key of the host
- host and guest users should be able to co-author a commit easily
- guest users should be able to bring their own git key securely in managed multi-user environments like JupyterHub (both for authentication and signing)
[ ] citation manager needs to store the access key for reference manager; multiple collaborating authors should have access to their respective collections (without leaking the key)

jupyterlab / jupyterlab

Exploring and defining UX expectations for a RTC enabled multi-user environment (like JupyterHub) #10119

Exploratory questions to define UX expectations

Related links