openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.49k stars 4.7k forks source link

RBAC Migration Followup #25: Deprecate policy commands in favor of RBAC equivalents #18096

Open enj opened 6 years ago

enj commented 6 years ago

Add cobra.Command.Deprecated to the following commands:

  1. oc adm policy {add|remove}-{cluster-}role-{to|from}-{user|group} ---> oc {create|edit} {cluster}role{binding}
  2. oc adm policy reconcile-cluster-role-{bindings} ---> oc auth reconcile
  3. oc policy can-i ---> oc auth can-i
  4. oc adm policy remove-{user|group} ---> oc {delete|edit} rolebinding
  5. Update docs https://docs.openshift.org/latest/admin_guide/manage_rbac.html#managing-role-bindings
  6. Update ansible playbooks to use RBAC commands instead

Deprecated should contain a pointer to the RBAC equivalent command. The CLI completions will have to be regenerated since we do not include completions for deprecated commands.

@liggitt @deads2k any others come to mind?

cc @bparees @openshift/sig-security @juanvallejo @fabianofranz

bparees commented 6 years ago

oc adm policy {add|remove|-{cluster-}role-{to|from}-{user|group} ---> oc create {cluster}role{binding}

I still think this is a bad user experience.

1) I now need to come up w/ unique names every time i want to give a user a new rolebinding (and remember/determine that name when i want to remove it) 2) it explodes the number of etcd objects (one binding object per role assignment per user)

bparees commented 6 years ago

(my comments only apply to (1) obviously. I have no opinion on (2) and (3))

enj commented 6 years ago

I still think this is a bad user experience.

I do agree that it is less convenient, but I find it significantly easier to understand what is actually happening when you run these commands. For example, add-role-to-user, by default, creates or edits a role binding to a cluster role. This is hard to reason about, especially when you run a bunch of these commands together.

I now need to come up w/ unique names every time i want to give a user a new rolebinding (and remember/determine that name when i want to remove it)

Yeah, it forces you to take ownership of the objects you are creating. It is explicit, and also less convenient. You can also edit an existing object, which requires you to understand the objects.

it explodes the number of etcd objects (one binding object per role assignment per user)

Kube seems to survive with this. The whole behavior of these commands was built around etcd2's poor scaling with a large number of objects. Let's hope etcd3 is beyond that.

bparees commented 6 years ago

Kube seems to survive with this.

Kube also survives w/ the secrets problem. They don't run the scale we do as I understand it.

simo5 commented 6 years ago

@bparees I think this is mitigated by the fact that we do not force the creation of objects as we do for secrets. Asmins that end up creating large amount of objects unnnecessarily can be coached and helped to consolidate those, so at least the size problem is manageable where it makes sense. This is to say I do not think size matters should influence this issue.

simo5 commented 6 years ago

@enj on point 1, should we instead point to some other command to eidt groups ? I am not confortable pointing at a create command from a modify command, some people know what they are doing and they deserve to be pointed at how to modify objects, telling them to create new ones is not helpful or appropriate in all cases.

enj commented 6 years ago

Kube also survives w/ the secrets problem. They don't run the scale we do as I understand it.

Sure but secrets are large (because of the embedded cert) and we have controllers auto create (and recreate if deleted) a bunch of them per namespace. RBAC objects are small, and we only auto create a few for the SAs and admin user when the project is created. Any extra objects are purely user driven.

simo5 commented 6 years ago

Point n. 4 doesn't really have equivalents. While the other commands can reasonably transitioned to oc edit rolebindings as generally you are going to edit a specific role/binding, the command in 4 is more complex and removes a user from all namepace roles (even though it doesn't touch cluster rolebindings). I'll leave that one out as I think it falls off the "equivalent" charter for this issue.

simo5 commented 6 years ago

Soo, after I wrote the PR I started looking at fixing the docs, including figuring out how to provide command line replacements of previous commands and ... I find it exceedingly difficult to do so in non-interactive ways.

oc create rolebindings + oc edit rolebndings is kinda fine, except you need to look at an existing rolebinding to figure out what to put in there, but ok, let's assume that is something I can learn.

The problem is finding non-interactive ways to do single things like adding a sencod subject to a rolebinding or removing a subject.

I am not confident deprecating these commands without providing alternatives is a good idea. As usability-wise wat kube provides is laregely insufficient and way too burdensome for casual use.

If we deprecate these commands we should provide similarly useful replacements, not necessarily identical, but command line oriented commands at the very least.

simo5 commented 6 years ago

Just as an example, trying to add a user to a rolebinding is not even a consistent operation, as it depends on what's already in the object. In my attempt, adding the first user seem to work just with this command:

oc patch rolebinding TEST --type=json -p '[{"op": "add", "path":"/subjects/0", "value":{"kind": "User", "name": "USER"}}]'

however adding a second user requires:

oc patch rolebinding TEST --type=json -p '[{"op": "add", "path":"/subjects/0", "value":{"kind": "User", "name": "USER2"}}, {"op": "add", "path":"/userNames/0", "value":"USER2"}]'

luckily the index values for the arrays expressed in the paths are irrelevant.

Later on I tried to add a group with a command analogous to the first one here, but that did not work, and also an analogous of the second did not work because groupNames is set to null, so I had to use this:

oc patch rolebinding TEST --type=json -p '[{"op": "add", "path":"/subjects/0", "value":{"kind": "Group", "name": "GROUP"}}, {"op": "add", "path":"groupNames", "value":["GROUP"]}]'

note how I had to create a whole array, yet I did not need to do that to add a first user apparently, the userNames array was added with the right username value for me ...

This inconsistency and complication is not easy to document, and it is certainly not usable in idempotent scripts ...

oc auth reconcile requires passing in a full json object in a file (or via stdin) which is even more awkward ...

bparees commented 6 years ago

@simo5 +1. Also I see you added update docs to the list, which i was going to suggest, but also our ansible playbooks use these commands so those should be updated as well (and you may find they run into similar problems in terms challenges for non-interactive usage)

liggitt commented 6 years ago

The hidden cost of the magic "add to the first binding you find" behavior is that it breaks when combined with manifest driven bindings. In isolation the magic command is more usable, but it is problematic when combined with the rest of kube

deads2k commented 6 years ago

The hidden cost of the magic "add to the first binding you find" behavior is that it breaks when combined with manifest driven bindings. In isolation the magic command is more usable, but it is problematic when combined with the rest of kube

This. Collapsing onto kube for kube resources to be managed like other resource manifests is better in the end. Consistency of experience (even if it may seem less convenient at first), pays over time.

simo5 commented 6 years ago

@liggitt @deads2k I've put myself in the shoes of an admin that does not breathe Openshift source level details every day for breakfast, and I just screamed "noooooo". Seriously, unless you are a wizard, oc edit/oc reconcile is hardly usable and finnicky. @liggitt I do not like one bit the behavior of picking or creating a random role binding, it is dumb to do that, and it is unfortunate that the current interfaces do it. However throwing everything away is the classic "throwing the baby with the bath water".

I have 2 options here: 1) Add upstream some decent (but not arbitrary) commands to manipulate RBAC, that will make users not want to throw tomatoes in our faces. 2) Add new commands downstream to do (1) if it is something we do not want/can't have in kube.

In more detail, We need a command to add/remove a user/group to a rolebinding (not a role), and a companion command to create a rolebinding that references a specific role.

Also, less important, we need a command to either remove a user/group from all rolebindings in a specific namespace (possibly with a --cleanup flag that will also remove now-empty rolebindings), or at least document a command that will list of rolebinding s you have to touch if you want to remove a user/group from a namespace entirely.

Which way should we go ?

liggitt commented 6 years ago

We could make the --rolebinding-name arg required

simo5 commented 6 years ago

@liggitt we could but it would have to be optional or it'd break all existing scripts, so it is very suboptimal. I think the only way to get better commands is to provide new ones, and preferably upstream.

deads2k commented 6 years ago

@liggitt we could but it would have to be optional or it'd break all existing scripts, so it is very suboptimal. I think the only way to get better commands is to provide new ones, and preferably upstream.

I don't think that "management by scripting" will gain any traction upstream. "Management by resource manifest" is the "normal" approach to the entire project.

I've put myself in the shoes of an admin that does not breathe Openshift source level details every day for breakfast, and I just screamed "noooooo". Seriously, unless you are a wizard, oc edit/oc reconcile is hardly usable and finnicky.

I think that someone managing an openshift or kube cluster can reasonably be expected to roughly understand what they administer. No other part of the cluster is managed by scripts. Having this as an outlier does not make the mental load easier.

liggitt commented 6 years ago

@liggitt we could but it would have to be optional or it'd break all existing scripts, so it is very suboptimal. I think the only way to get better commands is to provide new ones, and preferably upstream.

The function you want is a subset of the existing commands. Why is it better to completely break all users of that subset by removing those commands, rather than restricting use of those commands to the desired subset over a few releases?

smarterclayton commented 6 years ago

Yeah, in general I think there are two use cases:

  1. i'm doing iterative development on top of openshift, and commands that help me accomplish my goal with minimum frustration (i want to give a user a role, or take that away)
  2. i'm doing config managed code, where the RB is created and managed frequently

The assumption with 2 is also that some automated system is applying this (jenkins, a bash script, whatever). The problem with the iterative commands is that while they solve the user problem and are approachable, the fact that they operate on things created by 2 leads to confusion, because by definition the automated system is going to stomp that action anyway.

I don't think making --rolebinding-name required helps users. However, being more principled about which role bindings we add to could very well do so. If the add-role-to-user command only affected role bindings that it itself created (when --rolebinding-name is absent), and remove-role-from-user only worked on role bindings itself created (unless --role-binding-name is specified, or a --confirm is passed), isn't that sufficient to bridge the gap?

deads2k commented 6 years ago

I don't think making --rolebinding-name required helps users. However, being more principled about which role bindings we add to could very well do so. If the add-role-to-user command only affected role bindings that it itself created (when --rolebinding-name is absent), and remove-role-from-user only worked on role bindings itself created (unless --role-binding-name is specified, or a --confirm is passed), isn't that sufficient to bridge the gap?

If you say your only entrypoint is either a resource manifest or oc create rolebinding foo --role, then removal is always a either a resource manifest or a oc delete rolebinding foo away. The middle ground you're reaching for only exists because of the commands you're currently running. Put another way, you only want the remove-from-group because you have the add-to-group. If you didn't have the add, you don't really care about the remove.

oc add-role-to-group --role=the-role --group=the-group --binding-name=the-binding
oc add-role-to-group --role=the-role --group=the-other-group --binding-name=the-other-binding
oc remove-role-from-group --role=the-role --group=the-group --binding-name=the-binding
oc create rolebinding the-binding --role=the-role --group=the-group 
oc create rolebinding the-other-binding --role=the-role --group=the-other-group 
oc delete rolebinding the-binding

They don't look appreciably different.

simo5 commented 6 years ago

So we all know that the problem with the current commands is that they pick "at random" how and where they add/remove users to roles. That is bad, because it is implecit and not visible to admins.

Two solutions have been proposes to salvage existing commands: 1) Add a --rolebinding-name required option so that these commands do not operate on randome rolebndings anymore 2) Make these commands never touch existing rolebindings and only create their own so they are more deterministic and won't break RBAC configuration that has been done via manifests by interfering with it

So both of these suggestions fundamentally rely on altering existing commands in ways that are not backwards compatible. (1) would immediately break any automation that doesn't know --rolebinding-name is required (ie all the automatiion using any of these commands) (2) would break more subtly, given it would (i guess) require adding annotations to know which rolebindings it created and can "reuse/delete" it would behave inconsistently across versions/clients. If I create add a bunch of rules, then upgrade, now I suddenly can't modify them anymore -> broken

The part that is bad about b oth approaches, beyond the fact they break stuff, is that we have no ways to wanr about it either.

A new set of commands could have better semantics. Examples:

oc create rolebinding foobinding --{cluser}role foo
oc add-subject-to-rolebinding foobinding --user joe
oc add-subject-to-rolebinding foobinding --group editors --sa baz
oc remove-subject-from-rolebinding foobinding --user joe
oc delete rolebinding foobinding

And would allow us to give users deprecation warnings for the existing "bad" commands. This new set of commands would not prevent the admin from messing up with rolebindings created by a manifest, but then this would happen intentionally and not by accident, so I think that is just fine, admins need to have at least a vague idea of what is ok to change and what not, and because this makes it explicit and required I think it is a reasonable interface.

@smarterclayton @liggitt @deads2k comments ?

liggitt commented 6 years ago

Add a --rolebinding-name required option so that these commands do not operate on random rolebndings anymore

The commands already have this feature, it is just optional today.

give users deprecation warnings for the existing "bad" commands

If someone is using add-role-to-user --rolebinding-name today, why would we break that, since it's doing what we want? Why not just output warnings in 1.9 that --rolebinding-name will be required in the future?

simo5 commented 6 years ago

On Mon, 2018-01-15 at 10:02 -0800, Jordan Liggitt wrote:

Add a --rolebinding-name required option so that these commands do not operate on random rolebndings anymore

The commands already have this feature, it is just optional today.

Right, I meant just "make it required".

If someone is using add-role-to-user --rolebinding-name today, why would we break that

We wouldn't, deprecation warnings currently do not break anything.

, since it's doing what we want? Why not just output warnings in 1.9 that --rolebinding-name will be required in the future?

Have we done this anywhere else ? If everyone agrees this is ok and sufficient, I guess I can do that.

simo5 commented 6 years ago

What about the policy remove-user|group commands ? Feeling like those may have to go for good ...

smarterclayton commented 6 years ago

The middle ground you're reaching for only exists because of the commands you're currently running. Put another way, you only want the remove-from-group because you have the add-to-group.

But people want add-to-group because they don't actually care about the role binding name. The argument seems to be that we should remove the existing commands. I don't think that's going to happen for several releases, mainly because we have spent 3 years documenting this for ad-hoc administration and the level of breakage is going to be high. I'm kind of on the "don't ever break it" fence, in which case I'm concerned about its continued use more than that "there's an easier way". I do think that the role-binding example you described above is indeed simpler.

deads2k commented 6 years ago

I'm concerned about its continued use more than that "there's an easier way".

That sounds a lot like "deprecate it because we want to encourage something else, but don't remove it for a while". That sounds fine to me.

I don't see why saying "add/remove" is easier than saying "create/delete" with the same information. Deprecating them and leaving them in place for a few years is fine with me. I just don't want to be chasing things like "make it nicer things for me on conflict" and "please add a feature to do X", when I know its not the direction of the project overall.

smarterclayton commented 6 years ago

I just don't to be chasing things like "make it nicer things for me on conflict" and "please add a feature to do X", when I know its not the direction of the project overall.

I think it makes sense to do the resource driven commands (as in your example) vs creating a second set of commands that looks similar to add-* - given that the existing commands have to lay around for a bit.

I was thinking though about how even a deprecated command will be used, and encouraging use of --rolebinding-name to the add-* commands is longer and more onerous than the create rolebinding variant. So making add-* "safe" for use did have some advantage, even if we encourage create/delete.

Stepping back a second (to address Simo's comments) - I'm concerned if we don't have a standard way to say "I don't want this user to have access to this" (by removing their role bindings). The flaw in that command is as we've stated - that config management tools may recreate it anyway. Our current discussion could even be restated as "the command allows you to think you've removed that user, when in reality he'll be right back". A benefit the "marking things created this way" has is that the command would be able to warn you that another tool could be actively restoring that role binding and your assumption is not valid.

Now, we could be overstating how often the config management part happens - I suspect namespaces end up having one of three levels: no management, object but not RBAC management, or full management, and the first and second are the overwhelming cases. But do we honestly believe that most users in the first two levels really want to fiddle with removing users from bindings manually? I suspect I'd be in a murderous rage the second time I had to do it, and the possibility that the user gets it wrong seems like a bad tradeoff. And that's ultimately why we did those commands...

simo5 commented 6 years ago

@smarterclayton @deads2k you give interesting perspectives, but I need a blessed direction now. It is not clear to me what you guys want to see. I can go any number of ways as long as we are all on the same page, as I do not want to do work and then just throw it away.

smarterclayton commented 6 years ago

What's the pressure to have to do something now for add-*? Agree we want to improve them somehow, but that shouldn't block the other things listed in the top section.

simo5 commented 6 years ago

Upon deprecation I need to update the docs on what is the new blessed way. As I showd in the PR comments, trying to use oc patch is unworkable, so I asked if we wanted to do somethign else. Seem your answer is: do not deprecated them and leave them as is. Fine by me if that's what you want. But then, should I deprecate anything at all ?

deads2k commented 6 years ago

What's the pressure to have to do something now for add-*?

There were complaints about either the add or the remove (I don't recall which), not performing retry on conflicts when fighting with other controllers (I think it was controllers). That brought it to the forefront and the realization that we have two sets of commands, one set I wrote in openshift pre-3.0 and one set I wrote in kube with lessons learned from 3.0.

Having two sets of commands leads to confusion and cases where the discrete sets don't play nice together. Both sets are internally consistent, but mixing the two makes your head hurt.

Ultimately, I'm not going to die on the hill for it, but if I had to own it, I wouldn't want to keep responding to issues, feature request, and pulls that take the openshift commands further from the upstream.

simo5 commented 6 years ago

@deads2k I am confused now, what are the 2 sets of commands you are referring to ?

So it seems like there is no consensus on removeing these commands, and most seem to oppose the idea of adding new commands. Most also recognize that the way the add-{cluser}role-to-[user|group] command behave is not ok.

So would it be fair to say that most agree that:

1) changing these commands to not re-use rolebindings on add is desirable (unless -rolebinding-name is provided) while allowing them to keep removing stuff as they do now is acceptable. This means they woun't be deprecated, only "improved" with a backward compatible change in behavior.

2) It still seem we should just deprecate remove-[user|group] and eventually remove them.

@smarterclayton @liggitt ?

simo5 commented 6 years ago

@deads2k @liggitt @smarterclayton ping ?

simo5 commented 6 years ago

Ok folks #18102 is all green and implements the above, PTAL @deads2k @liggitt @smarterclayton

enj commented 6 years ago

Leaving open for future deprecation.

openshift-bot commented 6 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

simo5 commented 6 years ago

I do not see us deprecating the policy commands @enj, should we keep this card around ?