Allow multiple LTI client IDs

jeflem commented 1 year ago

Proposed change

To use LTIAuthenticator the LTI platform's (randomly generated) client ID for the tool (JupyterHub/LTIAuthenticator) has to be provided to LTIAuthenticator via jupyterhub_config.py: c.LTI13Authenticator.client_id = 'some_string'. At the moment only one client ID can be specified. Thus, LTIAuthenticator (and the whole JupyterHub behind it) can be used as exactly one LTI tool.

For several use cases (see below) LTIAuthenticator has to be used as multiple tools on the platform side. Thus there will be several client IDs LTIAuthenticator should accept. On the config side something like c.LTI13Authenticator.client_id = ['id1', 'id2', 'id3'] would be nice. LTIAuthenticator then accepts all requests having one of those client IDs.

Alternative options

The only alternative I see at the moment for the use cases below is to have several JupyterHubs in parallel and share user home directories between hubs. Not really an option.

Use case 1: different nbgitpuller links

In a Moodle (or whatever LMS) course one wants to link to different notebooks in one and the same git repo via nbgitpuller. On LMS side (at least in Moodle and some less known regional LMS named OPAL or OLAT) one has to configure one LTI tool per nbgitpuller link (the nbgitpuller link is the tool URL the platform redirects to after successful LTI authentication). Moodle creates random client IDs for all LTI tools. These IDs cannot be modified. Thus, each nbgitpuller link comes to LTIAuthenticator with different client ID.

Use case 2: using multiple LMS instances

Bigger JupyterHubs may be used by multiple institutions or an institution might have two different LMS in use. Then the Hub will be accessed from different LMS and, thus, with different client IDs.

Use case 3: autoenrolement to nbgrader courses

We (some colleagues and me) are currently working on using LTIAuthentictor and some custom tools to automatically enrole students to nbgrader courses on our JupyterHub when they come in via LTI for the first time (saves a lot of time compared to manual enrolement). A student starts an LTI activity in the Moodle course, which sends the student to JupyterHub. There we evaluate LTI data (course title aso.) and enrole the student to corresponding nbgrader course.

Having multiple courses on the hub (with different audience) requires multiple LTI activities on LMS side (with different course titles) and, thus, multiple client IDs.

Suggest a solution

From my (very limited) point of view adding support for multiple client IDs in LTIAuthenticator requires only one additional line of code in auth.py, line 44:

client_id = TraitletsList(config=True)

(and removal of line 31 (a comment)).

Checking the client ID is done by jwt.decode. The client_id value from jupyterhub_config.py is passed to this method via audience argument, which accepts an iterable (see third code box in audience doc).

I've tested this locally and it works. But I don't know whether there might be unwanted side effects.

welcome[bot] commented 1 year ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

jeflem commented 1 year ago

I realized that use case 2 would require additional configuration (not only multiple client IDs). But in contrast to use cases 1 and 3 it's of minor interest. So the feature request concentrates on use cases 1 and 3.

martinclaus commented 1 year ago

Hi @jeflem, thank you for already suggesting a solution!! I will take a look into it and try to figure out if it might be causing any problems with the OIDC security framework.

Regarding your use cases: I think use case 1 and 3 are valid points if the LMS only supports single-tenant tool registration. I have no moodle instance to test, but from the docs it looks like moodle is supporting multi-tenant tool registration, which would potentially make things easier for you. However, I can also see that multi-tenant registration is sometimes not what you want, e.g. if you have a university-wide LMS but a self-operated JupyterHub run by yourself to which you want to control access to.

Use case 2 I think is problematic because of username mapping. If multiple LMS are accessing the same JupyterHub instance, there is currently no way of guaranteeing that two different users of two LMS are mapped to the same username on the JupyterHub. One would need to prefix usernames with the issuer because each LMS would be a unique issuer, but I think in this case it would be cleaner to run a dedicated JupyterHub instance for each LMS.

jeflem commented 1 year ago

Hi @martinclaus, thank you for looking into this.

Use case 2 is indeed more problematic than I thought. But we don't really need that use case. The important ones are 1 and 3.

Some background: For use case 3 we want to send custom parameters which are different in each (Moodle) course. Thus, each course requires a slightly different tool configuration resulting in different client IDs. AFAIK Moodle and other LMS do not support 'tool templates' configured centrally by the admin (with one client ID) and partially overwritten by the instructors.

Our LMS instance is used by several universities and each university and maybe each department will have its own JupyterHub on local servers. So tool configuration in the LMS will be done by the instructors, not by an LMS admin. LTIAuthenticator with multiple client IDs would be the most straight-forward solution.

I think from a security point of view multiple client IDs are okay. JHub acts as two or more tools on the platform side. But I do not know whether making client_id a list breaks functionality at some point.

consideRatio commented 1 year ago

Quick review note from my mobile, if you have multiple providers, you may get the same username from two providers, but could be different users maybe - depends.

I'm not sure what usernames users end up getting, but i think its important to consider this for us as maintainers, and then also for whoever configures the jupyterhub with multiple providers. If this is setup without username collision safeguards, there must be a big warning sign for admins setting this up in the docs, and perhaps also a log message.

In jupyterhub/oauthenticator, the CILogonOAuthenticator has various tricks to adjust usernames for individual providers under allowed_providers, taking config for each separate provider and setting up rules to manipulate the username to avoid collisions.

jeflem commented 1 year ago

Even if two or more client IDs are accepted they will come from the same platform, because other config options like c.LTI13Authenticator.issuer, c.LTI13Authenticator.authorize_url, c.LTI13Authenticator.jwks_endpoint do not allow for multiple values. Thus, for all client IDs usernames should be unique as long as the platform's user ID (LTI sub claim) is used for the hub username (that's the default behavior).

Of course, LTIAuthenticator allows to choose other LTI claims for hub usernames. Maybe one could note in the docs that this isn't a good idea in combination with multiple client IDs (eventually, with single client ID it's not good, too).

martinclaus commented 1 year ago

I will open another issue to add a warning in the docs to avoid user name collision. Even with a single identity provider user name collisions may happen, if the username_key is set to something that is not unique for a user, family_name is a case in point.

martinclaus commented 1 year ago

We've just published jupyterhub-ltiauthenticator 1.6.0 with this feature being implemented in #152.

jupyterhub / ltiauthenticator