openshift / oadp-operator

OADP Operator
Apache License 2.0
76 stars 70 forks source link

Design: OADP non-admin use cases #1043

Open weshayutin opened 1 year ago

weshayutin commented 1 year ago

Context:

The work here should be build a CRD's and controllers that would intercept all backup, restore, bsl creation and check if the user has the correct permissions to do the action.

New CRD's

Controller: https://github.com/migtools/oadp-non-admin/

Scope:

Spec:

Related work: https://issues.redhat.com/browse/OADP-203 https://github.com/weshayutin/oadp-operator/tree/tekton-non-admin/docs/tekton-oadp-nonadmin https://github.com/shawn-hurley/prototype-non-admin

shubham-pampattiwar commented 1 year ago
shawn-hurley commented 1 year ago

I would suggest that a potential to consider here is that a new CR that is specifically for this purpose is used. Something like NamespacedBackup, or something similar, that when a user has access to that CR they are able to backup the Namespace. this will allow you to create a NamespaceRestore that must reference the NamespacedBackup as a local reference.

I think that some things to be very concerned with, is that Velero is single thread AFAIK and I think it needs to be a prequisite to make it so that it can backup multiple namespaces at a time. If this is not fixed, then no amount of good secure implementation here is going to matter IMO because it will be unusable.

shawn-hurley commented 1 year ago

@shubham-pampattiwar Answes as I see them:

We should discuss more about what kind of user info we need ?

if you use the webhook, you can do the SAR requests to the objects that you want to make sure the user has access to. The user info that you get in the webhook is all you need to ask these questions of the RBAC system.

How do we decide the particular user has appropriate access to a resource to be backed up, eg PVC etc ?

I would think that we make it clear that a user who has access to this resource in this namespace gets access to all the resources in the namespace, similar to admin. One thing that we need to make sure of is if a role binding is bound to a cluster role, we need to either downgrade to a role OR we need to assume that something else can restore that. and that we don't back that up and that you can't restore it. From a user/non-admin restore operation.

Should we make use of admission webhooks for backup/Restore CR creations ?

Yeah, I agree; this needs to be there to at least do some permissions if you don't create new CRs and attempt to re-use the existing ones.

Will there be a different BSL per user or a shared one ? How will this scenario work ?

I think that a Namespaced Backup can only use a BSl that is defined in the namespace of the backup, OR we create a ClusterBackupStorageLocation, that an admin can define for everyone.

What if there is a change during the Backup/Restore operation? How do we handle this scenario ?

I need more context, but do you mean like the RBAC system? I think this is an interesting use case, I think that if they are removed they would be unable to use the objects anymore so I don't think there is a escalation concern but we should make sure.

I hope that this helps, I have thought about this in the past, and we are having similar concerns that we had with OLM, where some stuff is just ClusterScoped by nature and tried to make it non-admin.

weshayutin commented 1 year ago

@shubham-pampattiwar Answes as I see them:

We should discuss more about what kind of user info we need ?

if you use the webhook, you can do the SAR requests to the objects that you want to make sure the user has access to. The user info that you get in the webhook is all you need to ask these questions of the RBAC system.

How do we decide the particular user has appropriate access to a resource to be backed up, eg PVC etc ?

I would think that we make it clear that a user who has access to this resource in this namespace gets access to all the resources in the namespace, similar to admin. One thing that we need to make sure of is if a role binding is bound to a cluster role, we need to either downgrade to a role OR we need to assume that something else can restore that. and that we don't back that up and that you can't restore it. From a user/non-admin restore operation.

I agree that access is namespace based and it's 100% access to the namespace or no access.

Should we make use of admission webhooks for backup/Restore CR creations ?

Yeah, I agree; this needs to be there to at least do some permissions if you don't create new CRs and attempt to re-use the existing ones.

I'm not sure if you are referring to dynamic admission control, I think you are. If the admission control is scoped to only check api requests for velero than we might be ok. A large customer already nacked using dynamic admission control via policy, the context here is a little different I think. Needs discussion.

Will there be a different BSL per user or a shared one ? How will this scenario work ?

I think that a Namespaced Backup can only use a BSl that is defined in the namespace of the backup, OR we create a ClusterBackupStorageLocation, that an admin can define for everyone.

Let's talk about this one, I think we want a non-admin to create their own bsl and only have access to bsl's they create. Not 100% sure, needs discussion.

What if there is a change during the Backup/Restore operation? How do we handle this scenario ?

I need more context, but do you mean like the RBAC system? I think this is an interesting use case, I think that if they are removed they would be unable to use the objects anymore so I don't think there is a escalation concern but we should make sure.

I hope that this helps, I have thought about this in the past, and we are having similar concerns that we had with OLM, where some stuff is just ClusterScoped by nature and tried to make it non-admin.

shawn-hurley commented 1 year ago
A backup is running and permission to the namespace is removed, it would be too late to stop the backup.
what other scenarios are you thinking about here?

If that does happen things should be fine, because the user would be unable to use the backup in anyone once finished right? And they wouldn't be able to restore etc. I think this should be fine and there is nothing to do IMO.

If there is another scenario that someone can come up with though would be interested in hearing about it.

shawn-hurley commented 1 year ago

Something to condiser, if a namespace say X is deleted, how do you resotre that namespace from new namespace X or even new namespace A. This probably should be accomplishable right?

shawn-hurley commented 1 year ago

A large customer already nacked using dynamic admission control via policy, the context here is a little different I think. Needs discussion.

I don't understand what this means or what they nacked, maybe we sould have a seperate discussion on this.

weshayutin commented 1 year ago

discussion has moved to https://hackmd.io/qbYw8wq4RZSWEe2KZpzLaw Once we have solid design written out in the hackmd we'll copy it back to this issue.