Lock down RBAC permission on Sidecar [was: sidecar as DaemonSet?]

davidvossel commented 6 years ago

It's not immediately obvious to me what the value is of running a sidecar 1-to-1 with each game server. Have you all considered running the sidecar process as a DaemonSet instead?

markmandel commented 6 years ago

This is a great question, we considered running this as a Daemonset, but there were several aspects that pushed us towards running it as the sidecar:

If it is a daemonset, there is an enlarged security attack vector. Essentially the daemonset RPC would need to receive the name of the GameServer that was sent, and trust that was accurate. In this model, a compromised GameServer could attempt to manipulate other game servers in the node/cluster by faking/brute forcing different GameServer names via the RPC. Keeping the sidecar as the endpoint for the RPC means that to do the same thing would mean that the attacked would need to compromise the game server process and the sidecar process (which is quite isolated from the game server, as it's in a separate container). This is a smaller vector, and much harder to do.
Having a sidecar lets us use native Kubernetes health checking, by proxying the health values through to a http endpoint on the sidecar. This saves us a lot or work, as we then only need to generate gRPC clients (which is relatively trivial) for any language we want to have a SDK for rather than implementing a full http server for each language (and also means we don't open a port on the gameserver, which is a potential another security attack vector).
(We expect) that the sidecar will also be an http endpoint for stat collection as well - and again, we just use the gRPC SDK -> sidecar pattern to expose this in a per-gameserver basis very easily.
Finally, if the sidecar dies for whatever reason, it only affects a single gameserver, it doesn't affect all the game servers on a node.

Hopefully that explains things?

davidvossel commented 6 years ago

Yup, valid points!

I'll clarify what I'm concerned about and why I brought this up.

Each sidecar process (while doing all the things you've listed above) is also acting kind of like a k8s controller. This is necessary because you need the ability to update the 'status' field of CRD you're using and do event recording.

The ServiceAccount you give the GameServer pod is locked to get/set permissions on your CRD, however this gives both the sidecar and the thirdparty game server process access to all GameServer objects in the cluster. You can limit the scope by namespace, but I'm not sure you'd ever want the actual game server process to have the ability to modify a GameServer CRD.

It would probably be more efficient and secure to break the sidecar into two parts.

grpc api endpoint, health check endpoint, etc...
event recorder, crd status update, etc...

Item 1 could still exist as the sidecar process you have already, so you get all the great stuff you want with sdk. Item 2 could exist as a single node level controller (deployed as a DaemonSet) similar in function to the kubelet.

markmandel commented 6 years ago

Good conversation! :+1:

So one thing worth bringing up, is that eventually we will have subresources for CRDs (looks like in 1.10) - which will mean we can actually restrict the sidecar's access to just the status fields of the GameServer - which will make things better. Not ideal but better.

Regarding your point about the sidecar and the game server process having access to the all the GameServer objects - this is also totally valid (and a prior concern). In an ideal world, it would be great if we could do per-container RBAC accounts (which would solve all the issues). That being said, if we go through the SDK, they still have access, but the API changes a little. That also being said, if we do a Daemonset, we could limit it to just those GameServers that are valid for the node that it is on, thus limiting the attack vector somewhat.

Splitting also also makes local development a bit harder, as right now all we need is a local version of the sidecar, not two separate processes, as well as scaling (right now we basically use the master as our communication mechanism, so we can lean on multi-master for scaling). So there are pros and cons across the board.

Now that all being said, I'm digging into this a little, and wondering if we can just only give the sidecar container access to the API token. Doing some research, it looks like we can Manually create a service account API token, which could then be applied to the sidecar (but not the gameserver) as a Secret - maybe as an env var, or as a file on the file system.

Looking at the code for an InClusterConfig here - it doesn't look particularly hard to switch out the source of the API token, and use that as the config for the sidecar -- this would give us best of both worlds. The sidecar as it currently stands, but the game server process has no token to access the Kube API. At the very least, this looks like a good area for research and exploration. WDYT? I'm still getting up to speed on RBAC, so there may be flaws in my ideas.

/cc @dzlier-gcp - any thoughts on this?

davidvossel commented 6 years ago

So one thing worth bringing up, is that eventually we will have subresources for CRDs (looks like in 1.10) - which will mean we can actually restrict the sidecar's access to just the status fields of the GameServer

I knew there were discussions about that last year but I didn't realize that had actually gained traction. Very cool, I actually have a use case for this in another project. Thanks for the tip!

which will make things better. Not ideal but better.

yeah, unfortunately it doesn't :/

Now that all being said, I'm digging into this a little, and wondering if we can just only give the sidecar container access to the API token. Doing some research, it looks like we can Manually create a service account API token, which could then be applied to the sidecar (but not the gameserver) as a Secret - maybe as an env var, or as a file on the file system.

Makes sense. You generate the token and attach the secret as an env-var to the sidecar pod only.

A couple of thoughts off the top of my head.

Would this work with ReplicaSets? can we generate a single token that's used by all pods in the set?
This needs discussion on some upstream k8s sig mailing list. We'd want to know why the decision was made to apply service accounts per pod and not per container. It's possible we're overlooking some obvious security flaw here.

interesting thought, worth investigating :+1:

If the DaemonSet approach ends up emerging as the viable option, I have some ideas on how to do it in a secure and efficient way... I do agree avoiding the DaemonSet would be nice though.

alexandrem commented 6 years ago

The daemonset approach for node-level controller might be interesting in terms of separation of concerns, but doesn't bring much more in terms of security.

The problem is that you're still going to have a service account token attached to a running container on the same host where other game servers are running and you only rely upon the provided capabilities isolation of the runtime.

Containers isolation is not yet a complete reliable thing and there are still some vulnerabilities popping up every once in a while that allow escaping those boundaries, e.g https://www.twistlock.com/2017/12/27/escaping-docker-container-using-waitid-cve-2017-5123/

That's for the Linux side. Eventually if we want to have hybrid worker nodes in Windows that serve as dedicated servers too, then we might have even more isolation problems. I'm not 100% sure what's the effectiveness on Windows container isolation, I guess it depends if you chose Hyper-V or the new process level in newer versions, but I'm pretty certain the isolation level is not on par with Linux. Something to validate with sig-windows I guess.

So you end up with a token on system that can be used to manipulate existing game server resources.

To mitigate this, I believe it comes down to:

Use a k8s virtualization runtime (e.g kube-virt, etc.) combined with the proposed daemonset node-level controller.
Have a server side controller (not on worker nodes) that do polling on game servers exposed information and update CRDs.
Introduce an additional service where game servers send update commands with a token to identify themselves. Token could be a simple JWT that embeds what namespace and what CRD object can be written to. Definitely adds some complexity to the architecture though.

Second option always made more sense to me, especially for managing game server healthiness, because it's hard to enforce regular health status pushes the other way around. But I understand this adds some more complexity and also scaling problems in the long term like mentioned by @markmandel from that RBAC thread.

It's not clear to me how the CRD subresources are going to work with RBAC and if it's going to mitigate the current flaw. Anyone?

davidvossel commented 6 years ago

The daemonset approach for node-level controller might be interesting in terms of separation of concerns, but doesn't bring much more in terms of security.

DaemonSet could potentially make this more secure if we add an authentication mechanism that ensures each GameServer pod can only modify its own CRD.

For example, No container in the GameServer pod would have permissions via a ServiceAccount to update any GameServer object. The sidecare sdk would have to talk to the DaemonSet in order for updates to the GameServer status field to occur. The DaemonSet would know which GameServer pods are currently running on the local node and reject any request that can't authenticate as a known locally running GameServer.

Have a server side controller (not on worker nodes) that do polling on game servers exposed information and update CRDs.

I don't think a cluster-wide poller for game server info will scale well. If you reversed this and put a poller on each worker node that was only responsible for polling GameServers on the local node, that might scale. ( here I go pushing a DaemonSet on you all again :D )

Ideally we'd be able to make this all event driven rather than needing to introduce polling.

Introduce an additional service where game servers send update commands with a token to identify themselves. Token could be a simple JWT that embeds what namespace and what CRD object can be written to. Definitely adds some complexity to the architecture though.

This is essentially how I envisioned the auth for the DaemonSet working. In this case, the DaemonSet would be the additional service.

davidvossel commented 6 years ago

Use a k8s virtualization runtime (e.g kube-virt, etc.) combined with the proposed daemonset node-level controller.

Just wanted to point out a couple of things about virt.

the k8s + virt landscape is not mature yet. I'm seeing a lot of movement in this area and wouldn't feel comfortable placing any technology bets yet.
virt introduces additional overhead to each Pod that would be better if we can avoid.
cloud instances + nested virt can gets weird depending on the tech involved.

markmandel commented 6 years ago

@alexandrem regarding RBAC and Sub resources - this doens't mitigate the issue, but it limits it somewhat. Basically, it means that we could only give the RBAC service account for the sidecar permissions to update (or read, etc) the GameServer/Status field, rather than the entire GameServer CRD. This limits the amount of damage a compromised system could inflict, but doesn't remove it entirely.

@davidvossel - good call on checking with SIGs to see if there is an issue. Whomever jumps on that first, please post a link on this thread to a group/slack conversation.

I'm intrigued by the idea of passing a token through to the GameServer - that has some interesting implications. But I would love to see a solution where we don't need an additional complexity with multiple processes - but it may not be possible long term.

alexandrem commented 6 years ago

@davidvossel

I need to stress again that even with additional authN/authZ mecanisms, the daemonset approach unfortunately is still subject to the same attack surface as the sidecar, except for a better runtime isolation. A compromised game server that succeeds to elevate its privilege and gain root access to the host node will have access to the daemonset service account token and therefore be able to access all CRDs.

Auth delegated to the same local node isn't enough if we target least privilege. The auth would need to happen elsewhere, e.g having that controller deployed on some other dedicated nodes.

I agree with you about the virt runtimes, it's definitely not mature as of today. Kata containers seemed like the most promising solution in the longer term at the last kubecon, so maybe something to look at in the future to secure game server instances.

@markmandel

Yes that's probably me just being a bit too lazy because I couldn't find the exact design spec where it mentions the new RBAC integration with the subresource. It makes sense that this is the intended purpose.

davidvossel commented 6 years ago

I need to stress again that even with additional authN/authZ mecanisms, the daemonset approach unfortunately is still subject to the same attack surface as the sidecar, except for a better runtime isolation. A compromised game server that succeeds to elevate its privilege and gain root access to the host node will have access to the daemonset service account token and therefore be able to access all CRDs.

Ha, well yeah. That is true. However, if a gameserver could gain root access, then the kubelet's account creds would be available at that point along with all ServiceAccount tokens assigned to any other pod scheduled on the host. Our CRD service account token would be a part of a larger security breach.

alexandrem commented 6 years ago

Root access to the node is pretty bad by itself, but fortunately there are mitigation features being added progressively to limit the scope of damage, such as this one in 1.10:

https://github.com/kubernetes/features/issues/279

markmandel commented 6 years ago

Asked about creating a token manually (internally), and got this response:

Me: What if we manually create a service account API token and managed passing that around ourselves? Them: I wouldn't recommend doing that for a real design Me: Why not? Them: Because you're taking responsibility for credential management/rotation, and also operating outside of Kubernetes' identity model

So... that's not the answer we wanted. I'll do some extra digging, see if there are any plans for per-container service accounts in the future. Not sure which SIG RBAC falls under.

markmandel commented 6 years ago

...and as I say that, I'm presented a new solution. While we can't do per-container RBAC permissions, we can remove the service account from any container we want: https://stackoverflow.com/questions/38536599/container-disable-service-account

Basically by mounting an emptyDir in it's place - which gets us to the same place. So that is much more promising!

davidvossel commented 6 years ago

Basically by mounting an emptyDir in it's place - which gets us to the same place. So that is much more promising!

lol, that's creative. I suppose a user inside the container wouldn't be able to do something like umount /var/run/secrets/kubernetes.io/serviceaccount to regain visibility into the service account dir.

Looks like a good starting point. This is better than nothing :+1:

davidvossel commented 6 years ago

After giving this some more though. I think the simplest solution is a combination of

emptyDir hack to remove SA
RBAC that limits GameServer access by namespace

The emptyDir hack keeps the game process from having access to the SA... If by some chance there's a way around that, the SA would be limited via an RBAC Role to GameServers in a single Namespace. Arguably if a k8s cluster is being used to support multiple games, each game and environment (dev, staging, prod, etc...) within that game should have its own namespace anyway.

If we wanted, we could abstract away all this RBAC stuff entirely from the end user and have our controller generate the SA+RBAC Roles for us. Users would just interface with the GameServer objects and never have to know about service accounts.

markmandel commented 6 years ago

Amusingly, we are also talking about how we a Pod can only use a service account in the same namespace as it - and how that causes different issues. Sounds like the solution will go hand in hand for this one as well (#154)

Personally I'm not a fan of giving the controller the ability to edit RBAC permissions (that seems really scary). I think this would be better handled an install time as a one-off administrator operation just to be safe.

markmandel commented 6 years ago

Oh, also realised we could also turn off automount and explicitly mount the secret in the container - So several options to reach the same conclusion.

alexandrem commented 6 years ago

@markmandel For the turn off automount, the manual service account assignment is still at the pod level. Since we have the Agones sidecar who needs it, it won't do any good.

The emptyDir shadow mount trick is probably the best option to prevent access to the k8s resources by not injecting the service account token in the first place.

I also believe that ensuring a different namespace per game server is the best option for the current RBAC design. It'll provide proper isolation of gs resources with the api server. Not sure if we can ensure it with an admission controller (mutating?), but at least it should be the recommended usage for those creating the game server resource.

Definitely will go hand-in-hand with #154.

As for the game server node breach aspect, then it would limit the damage to only the neighbor game servers on that host (which is fair enough).

This should satisfy everyone's concerns so far :)

markmandel commented 6 years ago

This all sounds good - but we should be able to mount to serviceaccount secret directly into the sdk sidecar container, but not on any other Pods. Unless I'm mistaken (which I could be) that should still allow the restriction to just the sidecar, without the game server being able to access it?

Benefit being that it isn't as quite a "hacky" approach as the emptyDir workaround, and is more of a supported approach.

That all being said - we can totally try both approaches and see what works better, and this point it is more of an implementation detail :+1:

cyriltovena commented 6 years ago

For your information sidecar role has been scaled down to its own namespace.

cyriltovena commented 5 years ago

should we close this ? we're almost a year after !

markmandel commented 5 years ago

Actually, I still want to do this!! :smile: might be a good starter issue for someone?

There's moving the sidecar to just using /status API , and then reviewing the rbac permissions on the Pod, and then working out how to revoke all the RBAC permissions, specifically for the gameserver sidecar (maybe with the option to turn this off?)

//cc @jkowalski

googleforgames / agones

Lock down RBAC permission on Sidecar [was: sidecar as DaemonSet?] #150