hypothesis / lms

LTI app for integrating with learning management systems
BSD 2-Clause "Simplified" License
46 stars 14 forks source link

Fix LMS data model issues (roles and installs) #6443

Closed marcospri closed 2 weeks ago

marcospri commented 1 month ago

We have two main design issues in the LMS database model:

How we store LTI roles

LTI roles are sent per launch and apply to the current course. We currently store them at three different levels:

event_user.lti_role

It's useful to keep track of the roles in the event table as they reflect the roles of the launch in that particular moment in time.

We could change the design of event and event_user to explicitly make events "single user" (which applies to all event types we have now) including a event.user_id and replacing "event_user" by event_roles to keep track of the roles. That might make querying the current event table easier but it's doesn't necessarily make the current schema a poor design.

My suggestion would be to keep this as it is now.

user.roles

This reflects the roles of the user in their last launch. This is pretty much useless and using it will lead to very confusing results as without a course this data is meaniglines.

We should remove this column.

assignment_membership.lti_role

In theory LTI roles apply to the whole course so storing them at the assignment level is duplicating data and making queries less clear (why I have to join by AssignmentMembership to fetch students that belong to a course?).

We should instead store this in a new GroupingMembership.lti_role column.

We could:

How we store LMS data once per applications instance

First a note on tool_consumer_instance_guid (GUID)

From H's point of view users and groups are the same if they have the same h_userid (users) and authoritiy_provided_id (groups)

Those values are calculated hashing the ID of the entity on the LMS + the GUID.

This has worked well over time but it has a few issues:

In retrospect, now that we don't allow self-service LMS installs we could use an identifier we control and assign it manually to applications instances instead of using the LMS provided GUID.

That would be a big paradigm change so let's consider it out of scope for now.

Note on application instances

Our applications instances indientify installs of our tool in the LMS. Installs in the LMS can have different settings but their main purpose is authentication of our tool <-> LMS.

There are many cases where we might have multiple installs in the same LMS, course installs vs system installs vs sub account installs is the main one.

We also have multiple installs for different LTI version of a variety of edge cases where the LMS admins decide to install our app multiple times.

In theory those multiple install won't overlap at the same time (ie we won't get launches from them at around the same time) but in practice they do.

We use application instances for the initial handshake of the launch and to talk to LTI APIs. In a logical sense users,groups & assignments don't belong to a particular authentication key but to the LMS install, that we have decided to identify by its GUID.

How we currently store LMS objects in the DB

Applications instance as part of the unique key on the table

This is the model we have in "user" and "grouping". We duplicate for example the course information for each application instance we have seen it. This becomes a problem for any feature that queries these rows as it needs to account for differences between them or the fact that there would be duplicates in the first place.

For both users and grouping we also store h_userid and authority_provided_id with is the unique identifier of the entity in H.

One LMS object per "GUID"

This is the model we follow for assignments. We store unique assignments, based on GUID + LMS ID.

This prevents any duplication of the assignment data.

Proposed changes

We could move to a model where we store entities following this principles:

Deduplicating of these causes a lot of work when actually querying them.

Eg: user_application_instance, grouping_application_instance.

H identifiers (for groups and users) are opaque identifiers in the H side but are built on usually GUID + LMS identifier on the LMS. Always store all components of these in the same table on top of the hashed, unique H ID.

TODO

Draft PR with proposed changes

seanh commented 1 month ago

Agree that user.roles is useless and potentially confusing, and should be removed.

Also agree that AssignmentMembership.lti_role should move to GroupingMembership.

marcospri commented 2 weeks ago

Better articulated and with a concrete proposal at:

https://docs.google.com/document/d/1S0ordcbXc2W3pA3HZcEJ1tjmo9mBqbFOJcZ-qyKI75U/edit