singularityhub / sregistry

server for storage and management of singularity images
https://singularityhub.github.io/sregistry
Mozilla Public License 2.0
103 stars 42 forks source link

Role and permissions model #89

Closed victorsndvg closed 6 years ago

victorsndvg commented 6 years ago

Hi @vsoch,

in the recent past we had some discussion about roles and permissions. I would like to use this thread to collect all related posts.

I think this is related with #57 and #73 (and maybe some more issues or PR)

This is a collection of thoughts I have about roles and end user management. As I don't have a clear view of django permissions model, It's not a direct mapping with django roles, only some suggestions based on my vision of this amazing tool.

First of all, I would like to enumerate the roles I would like to be provided from SRegistry (top to bottom approach):

  1. Web admin
  2. Content admin
  3. Collection admin
  4. Visitor

Then, some use cases for each role:

  1. Web admin: (what I think is currently a superuser more or less)
    • He/she can provide content admin or superuser permission to other users
    • He/she can manage all collections/containers, push, pull and remove (like a super-Content admin involves all other roles)
  2. Content admin:
    • He/she can create new collections
    • He/she can assign (or remove) Collection admin privileges for owned collections to other existing users
    • He/she can manage his/her owned collection as a collection admin
    • He/she cannot manage not owned collections
    • He/she cannot assign or remove superuser privileges
  3. Collection admin
    • He/she can push containers to owned collections
    • He/she can pull private containers from owned collections
    • He/she cannot create new collections
    • He/she cannot pull private images from not owned collections
    • He/she cannot push images to not owned collections
  4. Visitor
    • He/she can only pull public images

It also would be great if those things related with assign or remove actions could be done from the web interface from some settings buttons:

  1. Web admin: The top-right menu shows a settings button. Clicking on it shows the whole list of users and he/she can assign or remove them Web admin or Content admin permission
  2. Content admin: Each owned collection has a settings button. Clicking on it shows the list of Collection admins. He/she can remove permission graphically or can assign typing a valid username
  3. Collection admin: can query his/her secret token
  4. Visitor: can only navigate and see public collections from the web interface

This is a kind of a draft without not too much level of detail ... What do you think about it?? Do you have something similar in mind?? Is someone else requiring this kind of roles?? I should directly use the issue-garbage-collector and move this issue to trash? :)

Let me know your thoughts!

vsoch commented 6 years ago

No this is super important and I need some time to think through and write up ideas. I will post here after that!

vsoch commented 6 years ago

I think what we want to do is think of permissions on a granular level, and then decide the groupings that go into each role.

For different objects in Django, each is associated with a permission to delete, create, etc. and it's also the case that Django let's us create "Groups" to handle different groups of permissions. So a good strategy would be to define permissions on a grandular level, put them into logical groups, and then assign users to one or more groups.

To be clear, permissions and authentication / authorization are separate. Authentication / authorization is largely handled by the superuser's choice of what login backend to use.

Permissions

This first group of permissions is about managing permissions. The typical flow will be that when a new user is added, they are assigned a permission set (and no single superuser needs to assign or delete specific permissions), however if needed, the superuser should be able to do this.

Content

This second group I would assign to a group called "Content Admin" It's what you get after you have been allowed to have an account, and you are allowed to create collections and containers. This would map nicely to Django idea of "staff." A content admin is akin to a collection owner, as if a content admin creates a collection, that means he/she can create/delete/update both the collection and containers in it.

Where "global" means "the user can perform the action for even other objects he didn't create," and "specific" refers to the collections that a user has been given permission to control. This means "the user can perform the action for only the objects he created.

Usage

Finally, we have permissions that aren't about creating, deleting, or updating, but just using. If a container is public, then the pull does not need a token.

I see no circumstances under which a non-authenticated user should be able to push. If it's private, then it needs a token, and the token must be associated with a user who has been added to the group of users who are allowed to push or pull. This means that an authenticated user has the ability to do:

When a user has permission to generate (and use) a token, then any collection owner can assign these permissions to the user:

Groups

Now I'll do akin to what @victorsndvg did, and talk about the Groups we would create (and assign people to) based on these permissions.

Superuser

has all permissions, can do anything, this maps to the Django superuser role. If we had to type it out

Content Admin

A content admin can generate collections and containers as well as interact with them, and can additionally grant content user's the permission to be a Collection Admin.

Collection Admin

A collection admin can generate collections and containers as well as interact with them, but only can control those that he creates.

Content User

A content user is primarily concerned with pulling or using containers. A content user has an account, and has a token to use the Registry. A token is not required for public images, and a token is required for private images.

Visitor

And finally, NOT having any permissions we call a Visitor. The visitor can browse the registry (if the portal is web accessible to everyone) and view public collections. Public collections can be pulled, and that's it.

Discussion

The points outlined in the original post above look good, but I'm not sure about the idea that a Collection Admin (or anyone with a role greater than Content User) should not be able to create new collections. I don't see harm in a user that is able to manage his/her collections also being able to create new ones. It would be a burden for the superuser / Content Admin to have to go out of his/her way to just create an empty collection.

Questions

Thoughts? @dctrud pinging you in too, because I think you've done this before!

victorsndvg commented 6 years ago

Hi @vsoch ,

Great job! I agree on most of your analysis.

About your questions:

Should a token always be required for pulling, even public collections?

I think public registries should allow downloading public images simply using singularity (singularity pull shub://... and without needing the token. Maybe private registries can always require the token. What do you think?

If the interface is web accessible, should anyone be able to make an account to be a content user?

not sure about the answer ...

For management, I think the Django default admin panel has a nice interface, but I need to check.

not sure about the answer ...

Finally, about the Content admin. I'm not sure if pull and push should be global. I think this role should not be able to overwrite valid existing container of a not owned collection. In my opinion, The big difference between Content admin and Collection admin is the ability to manage assignment or revoke Collection admin permissions. From my point of view I prefer to switch to:

If someone is able to delete, overwrite or modify whichever container is stored in the registry is the superuser.

What do you think!

Congrats for your summary, it's very clear!

vsoch commented 6 years ago

I agree about public images not needing a token.. I wonder if Docker just does that to track usage? I suppose if there is a malicious user that pulls excessively it would be something to address when it comes around, and the ipaddress could be identified and blocked. Even with a token, the user could just register another way.

I see the point about Content admin not having global push and pull, and I've adjusted the above. Question - is there a need for these two roles if the super user could just assign those to be Collection admins? Most of the roles are very intuitive to me, but not the Content Admin one. It seems like an extra layer we don't really need.

@dctrud throw in your thoughts when you get a breath of air!

dtrudg commented 6 years ago

Hello. Looks like the list of identified permissions is nice and comprehensive, and the admin-type roles make sense (though I'm with V that the content-admin is possibly unneccesary).

At this point, though, I'd be tempted to take a step back from this a bit and think about two things:

  1. What use cases are we trying to support - particularly thinking from the end-user side, rather than the admin-style roles?
  2. How might we be able to map easily into the simple user/group structures available in instiutional authentication systems?

Number (2) here might sound a pain, but is quite important if we think sregisty is going to be used a lot by HPC groups, providing a facility for their users. There is nothing worse, as an HPC sysadmin, than having to maintain a separate, incompatible security structure in a single app.

With regard to (1) these are the simplest set of things I tend to think about, from working in an academic HPC center:

Would it be good to all have a think to come up with a list along these lines - from other points of view? Ultimately you want the minimal complexity implementation that allows 80%+ of the use-cases identified. The greater the granularity of the permissions system, the more difficult it is to maintain and/or integrate with an existing auth system/user directory.

I'd caution here that it's possible (sometimes likely) the outcome can involve a bit of a custom permissions system - not standard Django groups applying to a standard row-level permissions framework. This may or may not be a good thing. If your requirements are miles away from a standard permissions framework, going custom is likely a good move. Shoehorning things into a standard way of doing things is a pain and makes it difficult to maintain. If you are close to a standard framework, then fitting to the framework is worth it.

victorsndvg commented 6 years ago

Great @dctrud , nice list of use cases!

@vsoch , what I would like to get from a kind of role like Content admin is somehow give some users permissions on particular repositories without involving superusers in the process.

e.g. I have a private collection (because I'm working with private/licensed software) and some members of my team needs to use an image. I can give them (one by one) permissions on a particular collection to download the images within it.

In the previous example, from the superuser point of view, the global admin don't want to manage permissions for 3rd party people (with a bunch of users is a bottleneck, time-consuming, etc.), only to provide the highest level of permissions.

vsoch commented 6 years ago

@victorsndvg I think that a Collection Admin would manage permissions for the collection, and the superusers would manage the Collection admins (it's a lot less frequent to add/remove admins). In your example, if you have a private collection (meaning you are a collection admin and created it, and a superuser gave you an ability, period, to create the collection) then it would also be up to you to add user permissions to it. It would be strange to have someone else assigning permissions for your collection.

@dctrud I think this is possible! Here is how I see the mapping:

I think we would stick simple, and then (optionally) could use a simple plugin to help manage roles. For groups of users relevant to collaborating on collections, a field of "collaborators" in the collection, to be controlled by the collection owner, I think is sufficient. For a permission plugin, this one seems simple enough --> https://django-guardian.readthedocs.io/en/stable/userguide/assign.html - it allows for associating Django groups with object permissions, and the usage is pretty flexible. @dctrud I'm guessing you have thoughts on this? Now let's discuss the different use cases!

An HPC user wants to share some containers with the world

He/she gets a user account, the user account must be created when the user logs in via the registry authentication of choice. A set of permissions are assigned via a group on account creation. He/she finds a public collection (or requests one to be made, in which case the user is allowed to be a collection admin) and shares the uri.

An HPC user wants to share some of their containers with everyone with an HPC account - nobody outside should get the xxx license info!

I think here we need two kinds of private - completely private, and registry private. Registry private means "looks public" to anyone with an account on the registry. If it's a subset of those users, then the collection owner would use standard private, and give individual permissions to specific users. This wouldn't be any kind of field for the collection, but a function that let's the collection admin add / remove users. The back end would just manage their permissions to pull the collection.

An HPC user wants to share some stuff with a single other person, either read-only or collaboratively.

If the collection is private, sharing read-only with another user means adding him/her to the pull list (as noted above) if it's private and the collection owner wants the other user to be able to collaborate on the collection (e.g., push too) then this would be adding the user to the collaborators field associated with the collection. This would also work if it's public.

The user wants to keep some containers completely private to themselves.

Then we need a third level of private, akin to facebook, the "me only" (and superuser viewable, likely)

A PI wants to allow unrestricted management of a private collection of containers by anyone in their group

The registry isn't going to manage how different institutions have their groups (e.g., a lab). It's usually the case that a lab member or two will have more involvement in the running of things, in which case

  1. all/any users would create accounts
  2. a few of them would be collection admins
  3. the collection admins would manage access.

We should likely have a button / setting that the collection admin can check on creation to "assign a permission to set" and then we ALSO need a simple User model that holds multiple users (for a lab) that we could call something like a Team. I think likely the team (and owner) would need to be only under control of the superuser, likely when the PI requests it. Then the PI account would manage the team, but the collection admins that belong to it could assign to their team. Thoughts here?

A PI wants to see anything from their group, but allow members to work in private.

I don't understand the second part, but for the first you would filter to team member owned collections. I don't particularity think that a PI should be some special kind of user that gets to play god over other users, if the member assigns their lab (team) to a collection, then the PI is part of that.

Two lab groups want to share active development of some containers/collections, but keep others private to their respective groups.

The containers/collections would be assigned to both teams, in both cases by the collection admins for each.

A lab group want to share containers read-only to another speccific lab group.

I think this would be making the container collection private, but sharing pull permission with the team

Thoughts?

victorsndvg commented 6 years ago

Hi @vsoch ,

yes, I agree with you, you are right about Content admins. Having the Collection admin and Content user profiles should be enought (I think).

Summarizing your previous post, from my point of view, I think the following could be an update of the permissions per role:

Superuser

has all permissions, can do anything, this maps to the Django superuser role. If we had to type it out

generate token change collection privacy (global) assign permission delete permission assign any Group revoke any Group pull container (global) push container (global) create collection delete collection (global) update collection (global) create container (global) delete container (global) update container (global)

Collection Admin

A collection admin can generate collections and containers as well as interact with them, but only can control those that he creates.

generate token assign Content user permissions (or add to Team) (specific) revoke Content user permissions (or remove from Team) (specific) change collection privacy (specific) pull container (global public, specific private) push container (specific) create collection delete collection (specific) update collection (specific) create container (specific) delete container (specific) update container (specific)

Content User

A content user is primarily concerned with pulling or using containers. A content user has an account, and has a token to use the Registry. A token is not required for public images, and a token is required for private images.

pull container (global public, specific private)

What do you think?

Exposed use cases are great to identify the needed permissions and roles!

In the particular case of a single user keeping some container completely private, I think we don't need any special feature like the "me only". I think the case can be included into a more general case of "containers sharing".

e.g. (modification of one of the previous use cases):

A Collection admin user wants to share some stuff with zero or more users, either read-only or collaboratively.

Do you think it has sense?

Also having Groups/Teams management sounds really cool! 👍

dtrudg commented 6 years ago

Hi @vsoch - I think django-guardian is a good framework to implement this with. I did play around with it before, and was pretty nice to use. What I did mess up a bit, though, was putting too many permissions at low level - which get to be a pain to manage. Keeping as much as possible in upper levels (i.e. at the collection, not setting for each container) is probably a good way to stay sane :-) I also think using groups as much as possible is nice to allow integration with institutional systems, since can map perms then from central LDAP/AD groups, rather than having to setup in the app only.

The only thing that also jumps out at me looking through the above comments, is the scenario:

An HPC user wants to share some containers with the world

He/she gets a user account, the user account must be created when the user logs in via the registry authentication of choice. A set of permissions are assigned via a group on account creation. He/she finds a public collection (or requests one to be made, in which case the user is allowed to be a collection admin) and shares the uri.

It seems a bit of a pain to have to ask to create a public collection, and a lot of experiments by users could end up polluting collections, What if the registry could be set up so that there would be two types of collection:

vsoch commented 6 years ago

@dctrud I really like that idea, and I'll think about how to best implement it. I think actually a simple boolean variable would suffice (to make it easy to query across) - whenever I get into polymorphism or different Models entirely it gets messy to do something simple like query across a collection. This solves the issue with public collections too - because the registry admin can choose (or not) to enable user level collections. If yes, users are free to create and share with the world. If not, the registry can still serve "admin level" containers.) So I'll adjust that paragraph to:

An HPC user wants to share some containers with the world

He/she gets a user account, the user account must be created when the user logs in via the registry authentication of choice. A set of permissions are assigned via a group on account creation.

I had to take a break on this to do work in the Singularity Python client, some other job stuff, and a handful of paper reviews, but I should get back to working on this now! Will post an update when I have something to discuss.

vsoch commented 6 years ago

okay, feeling really overwhelmed with this, it's too complicated and I don't like it.

victorsndvg commented 6 years ago

Hi @vsoch and @dctrud ,

from my point of view we are complicating things a little with particular cases. I think the most important features we talk about (and related with my needs ... maybe I'm missing yours ... ;P) are:

I think, If some kind of user can provide access to share his/her private collections to particular users or groups, this will be flexible enough to provide a lot of different use cases.

@vsoch , I'm here to help. Can I do something?

dtrudg commented 6 years ago

Hi @victorsndvg - at this point I'm no longer working at the institution where sregistry had the potential use-cases I identified. I'll step back from this, as it's obviously important that it reflects the needs of actual users. It's possible that someone else from the group I left will show up here.

Cheers,

vsoch commented 6 years ago

hey @victorsndvg - I agree this is too complicated. Bringing in object level permissions is something that I'm very hesitant to do. How about this:

  1. maintain django (standard) user roles (staff, admin, superuser)
  2. superuser can do anything and everything
  3. user is a standard user that is allowed to authenticate with the registry
  4. admin (a collection admin) is just a user that can create his/her own collections (via a push) given that the registry admin has enabled USER_COLLECTIONS

And then for each collection, the default is that it's public for pull. If the user makes it private, he/she can add/remove people to give permission to pull. They will be stored in a "collaborators" field of the collection.

I'm thinking about the idea of adding a "Team" (or organization, or group, or lab) but not decided.

vsoch commented 6 years ago

See #97 for further discussion. Sorry to lose you @dctrud , the robots will cover for you!

victorsndvg commented 6 years ago

@dctrud , good luck with your new work expericences!

@vsoch , I think your schema fits with the most common needs. For me it's Ok!

I need to test it deeply to see if we have any new request. Do you think It's ready for testing it?

Awesome work as always! 👏

vsoch commented 6 years ago

Not yet, I'll let you know (definitely within the next day or so!)

vsoch commented 6 years ago

In case it didn't ping you, please take a look at #97 !

victorsndvg commented 6 years ago

Yes, sorry I saw it yesterday. Tomorrow I will test it! 👏

victorsndvg commented 6 years ago

Hi @vsoch ,

I think we can close this issue.

A single question, after the merge, can I update my current production deployment? Or I'm going to break things?

vsoch commented 6 years ago

oup, I don't know, it makes me nervous because I don't. You would want to pull, then shell into the instance and do python manage.py makemigrations, and then python manage.py migrate I would back up everything beforehand just in case.

victorsndvg commented 6 years ago

FYI, finally, it broke my previous deployment. It seems that user_teams table was not created during migrations.

It was not a serious problem as I manage to restore the previous state (in terms of containers provided).

It's working perfectly! :clap:

vsoch commented 6 years ago

Ah sometimes I would need to do something like manage.py makemigrations teams because it would skip otherwise. I'm so glad to hear the good news otherwise!