Metamist permissions system update

dancoates commented 3 weeks ago

Context

Metamist currently has a fairly simple permission system. Each project defines a read group and a write group, these two groups define a list of group members. The members of the read group will have readonly access to the project, and the members of the write group have full unrestricted write access to the project.

Write actions within metamist are primarily performed programmatically via the API, but with some upcoming new features there will soon be the need to allow human users to perform a limited set of write actions. Some examples of these new features are:

Custom cohorts, where we will want to allow users to create cohorts
Attachments, where we will want to allow users to attach comments or other attachment types to various entities within metamist.

We do not want to grant full write access to allow users to perform these actions, so it is necessary to create a more nuanced role based permissions system that allows limited access to certain write operations but restricts access to anything destructive.

Solution

Moving away from storing the majority of permissions in groups to creating a direct relationship between users and projects. There is a lot of unnecessary redundancy from the way permissions are stored currently. By convention a read group will be suffixed with -read and similar for a write group. These are then referenced in the read_group_id and write_group_id columns in the project table. To continue extending this would result in a complex system with lots of columns on the project table and potentially a situation where the group suffixed with -read ends up being the write group.

This update introduces a new table project_member. This table holds the relationship between a member (represented by an email) and the project. In addition it includes a role column which specifies a role that the user has within that project. Each member can have multiple roles within a project by there being multiple rows for them in the project_member table for any given project.

In addition to the table updates, this PR also moves much of the permissions checking logic away from the tables/project.py file which acts as the interface for projects in the database. Instead this logic is now on the Connection class. This class is available pretty much everywhere throughout the project so it makes it much easier to quickly check permissions. This change also mitigates a class of performance problems that was starting to occur in the graphql resolvers where permissions checks were not being cached between resolvers. The Connection object is constructed once at the beginning of every request and all permission checks from then on are synchronous and should be very fast.

A permission check looks like:

self.connection.get_and_check_access_to_projects_for_ids(
    project_ids=[1,2,3],
    allowed_roles=ReadAccessRoles,
)

Where ReadAccessRoles is a set of the project member role enums. If the current user has any one of the specified roles in all of the specified projects the permission check will pass.

codecov-commenter commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 72.03883% with 144 lines in your changes missing coverage. Please review.

Project coverage is 80.67%. Comparing base (fd2c933) to head (56054fa).

Files	Patch %	Lines
db/python/connect.py	75.00%	20 Missing :warning:
api/graphql/schema.py	66.07%	19 Missing :warning:
api/routes/project.py	32.00%	17 Missing :warning:
api/routes/analysis.py	8.33%	11 Missing :warning:
db/python/tables/project.py	85.18%	8 Missing :warning:
db/python/layers/seqr.py	0.00%	7 Missing :warning:
api/routes/sample.py	14.28%	6 Missing :warning:
api/utils/db.py	60.00%	6 Missing :warning:
test/data/generate_data.py	25.00%	6 Missing :warning:
api/routes/web.py	28.57%	5 Missing :warning:
... and 18 more

Additional details and impacted files

```diff @@ Coverage Diff @@ ## dev #829 +/- ## ========================================== + Coverage 80.36% 80.67% +0.31% ========================================== Files 169 168 -1 Lines 14337 14170 -167 ========================================== - Hits 11522 11432 -90 + Misses 2815 2738 -77 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dancoates commented 2 weeks ago

What's your sense with how consistent our terminology is for referring to an authenticated "customer". We use author a lot, but I wonder if that's lazy, or something more accurate.

Yeah I mused on this a bit when working on the changes. In the past I've used the concept of a "viewer" which is defined to mean the currently active or viewing user. I think that name was a bit of a quirk of my last job though, I haven't seen it used elsewhere. "AuthenticatedUser" would be a more verbose but clearer name. At the moment the Connection class is performing that role, we could probably change that name as we kind of have two concepts of a connection - an API connection and a database connection, leading to a few places where you have connection.connection.

I think actor is another term that is used commonly for this

dancoates commented 1 week ago

Awesome, super clean!

Not blocking this PR, but can you write up a bit of documentation for how this works, and how developers should interact with it? For example, how project_admin permission works, when that role isn't given by cpg-infra (I see how it works in code, but worth writing down). And any handy methods people should take note of :)

Yeah, good thinking. I'll definitely write up some docs

populationgenomics / metamist