kubernetes / ingress-nginx

Ingress-NGINX Controller for Kubernetes
https://kubernetes.github.io/ingress-nginx/
Apache License 2.0
17.27k stars 8.21k forks source link

Split Control Plane from Dataplane (aka split ingress controller and NGINX) #8034

Open rikatz opened 2 years ago

rikatz commented 2 years ago

Due to recent CVEs, we started to discuss but never registered properly the need to split the ingress-controller process from NGINX process.

Here is some rationale on this:

While writing the template file is just a matter of a shared volume (empty volume, maybe) between both containers, the process of starting/stoping/monitoring is going to be a bit challenging.

I will use this issue as a side note for implementations attempts, and ideas are welcome!

/kind feature /priority important-longterm /triage accepted

rikatz commented 2 years ago

First need: write two containers manifests, one containing only ingress-controller and other containing nginx. NGINX will need some dumb-init for it, otherwise reloading/killing the process can present some challenge into the respawning monitoring.

Later: ONce all is split, need to glue together nginx with controller. The simplest approach would be to share the Process from NGINX with ingress-controlelr (but not the reverse!) and check if with small changes, controller can use NGINX Pid file (maybe shared as well?) to do the process operations

rikatz commented 2 years ago

Other thing we need to take care off: SSLPassthrough process today runs inside controller process. With the same rationale of "control plane does not receive any users traffic" we should probably think about sending the SSLPassthrough process to the proxy container. Maybe can be the same container running nginx, and being also the container "controlling" nginx

tao12345666333 commented 2 years ago

This is valuable.

I consider whether the model can be simplified? Use NGINX as a stateless container.

If it has a problem, just restart it.

Use the control plane to write the state to the data plane.

ElvinEfendi commented 2 years ago

@rikatz have you thought of taking a smaller step first and extracting only API interaction logic of controller first? Here's what I have in mind:

theunrealgeek commented 2 years ago

From what is discussed, I see the goal is to just split nginx and the controller process into 2 containers within the same pod. However from a scaling front, it makes more sense to have a few controller pods and lots of nginx pods to handle traffic.

On that front I like the gRPC idea proposed above, it creates an interface between the control plane and data plane which can, as the implementation evolves, be co-located for now but eventually start to separate out.

rikatz commented 2 years ago

I LOVE the idea of gRPC central control plane and the dataplane subscribing it. I actually discussed this approach with James and some other folks past in time, that this was going even to be a way to make it easier to implement gateway api, for example.

I just don't know where to start, but I can try (or are you willing to look into that? 🤔 )

I was just thinking on a way to make it a simple change (like share PID, issue reloads) but maybe creating a GRPC Stream endpoint and moving all the logics below syncIngress are actually better indeed.

rikatz commented 2 years ago

I'm wondering if we should open a branch in ingress-nginx for this and work on that.

BTW there is a previous art in kpng (https://github.com/kubernetes-sigs/kpng) doing the same thing for kube-proxy. I discussed with folks on past (👋 @jayunit100 @mcluseau ) and Jay actually asked if the whole Control Plane for GatewayAPI'ish shouldn't be like a kpng layer7 controller.

Right now, I think we can make it easier just splitting our logics the way @ElvinEfendi suggested, and the main controller of dataplane signing into the grpc endpoint from control plane but who knows the future and what lessons we can get from it :)

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

haslersn commented 2 years ago

In principle I like @ElvinEfendi's idea. It achieves the most important goal, namely the data plane has no access to the Kubernetes API and to the control plane's service account token.

Also there's no reason the data plane needs to be only an nginx process. It can readily have a managament process that does some of the things the controller does today, as long as that doesn't require Kubernetes API access.

However, if the control plane provides a gRPC API that the data plane containers/Pods access, then that might still provide a way for an adversary to corrupt the control plane. I'm not an expert at gRPC, so please prove me wrong, but I can imagine something like the following is possible:

  1. The adversary first finds an RCE vulnerability in nginx in order to take over the nginx process.
  2. From there, the adversary has access to the control plane's gRPC API.
  3. The adversary also finds an RCE vulnerability in the gRPC server in order to take over the controller process.
  4. From there, the adversary has access to the service account token and the Kubernetes API.

Maybe something like this can be prevented by using an architecture where data can only be sent from the control plane to the data plane but not vice versa.

strongjz commented 2 years ago

/lifecycle frozen

rikatz commented 2 years ago

About RCEs, I agree partially, but I think if we have a problem that there is an RCE in Nginx (doable, due to all the leaks and template sanitization we have) + gRPC we may have a much bigger problem.

Also, as soon as we split all of this I want to make sure control plane is distroless and just the cp binary runs there.

Finally, I was thinking also that we still need some counter measures and architecture definitions:

I will keep posting updates and discussions here

strongjz commented 1 year ago

https://github.com/kubernetes/ingress-nginx/pull/8955

rikatz commented 1 year ago

I am thinking again on it based on some recent discussions with @strongjz and the new sidecar container feature (which we shouldn't rely yet as a lot of users already use old versions), and based on some discussions on twitter:

tao12345666333 commented 1 year ago

I think we can discuss again the architecture we expect.

2023-07-17 13-50-07屏幕截图

2023-07-17 13-48-55屏幕截图

k8s-triage-robot commented 2 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted