knative / build

A Kubernetes-native Build resource.
Apache License 2.0
575 stars 159 forks source link

Why is Istio's sidecar disabled in build pods? #369

Open adamglt opened 6 years ago

adamglt commented 6 years ago

Running builds without a sidecar negates all Istio features - mTLS, RBAC, Egress control, etc.

Looks like the always-off setting was added here: https://github.com/knative/build/pull/74, but the issue/PR doesn't specify why.

Does the sidecar break builds for some reason we're missing?
If not, shouldn't this be configurable?

see https://github.com/knative/build/issues/369#issuecomment-425496280

@mattmoor @ian-mi

/area API /kind dev

shashwathi commented 6 years ago

@adamglt :

One reason I can think would be to avoid pod startup delays.

Build pods are not using any istio feature set either so there is no benefit yet. What would be your use case for adding sidecar?

adamglt commented 6 years ago

My use case is the same as any Istio controlled pod - traffic control, observability, and security. For example:

adamglt commented 6 years ago

/assign @mattmoor @ian-mi

adamglt commented 6 years ago

Looking at the code - the reason seems to be that all build steps run as init containers, where istio-proxy is a "real" container.

I guess this is a must to let steps run one-by-one in a specific order.
This seems to mean running builds with actual Istio integration is pretty much impossible..

Is there any intention to move from init containers to something else? (some pseudo-pause container?)

imjasonh commented 6 years ago

@adamglt That's not something we've considered so far, but we could if using init containers blocks some important functionality.

adamglt commented 6 years ago

@ImJasonH The main functionality we're interested is running builds for semi-trusted tenants.
Basically the build-environment version of a multi-tenant cluster.

For example, we might whitelist a certain tenant to access service X during builds, but not another tenant. Actual access control can go down to specific builds.
What we currently do is control external calls with Istio's RBAC, which is relatively easy.
NetworkPolicy capabilities are a lot weaker in comparison.

I've just seen this issue discussing something similar: https://github.com/knative/build/issues/32 but I'm a bit surprised there isn't more interest.
What's stopping builds from accessing random endpoints in the cluster / over the internet? Is everyone running "fully trusted" builds?

imjasonh commented 6 years ago

We're definitely interested in making Build suitable in a multi-tenant environment, and it should definitely be possible to limit what a build can reach, and what can reach the build. It's not an area we've spent a lot of time on so far, so your use cases are going to be really helpful in polishing this. #32 doesn't have a lot of answers so far, mostly just questions and ideas.

It would be great not to require Istio at all, and still be able to ensure a secure trustworthy multi-tenant build environment. I admit I'm not very familiar with the limits of k8s builtin limitations, and how Istio's are better, can you elaborate how network policies (and RBAC?) are weaker?

adamglt commented 6 years ago

Sure, it's a big topic but here's the main points I can think of:

In general, if you give up on the granularity Istio gives you, it's probably possible to restrict build network access using nothing but NetworkPolicy.

However, I think that organizations adopting Knative (and Istio in general) will probably drop a lot of L4 and in-app-TLS functionality for Istio's L7+mTLS, so having a completely different set of authentication/authorization methods for everything running under knative-build doesn't make a lot of sense.

Beyond the issue of "should Istio be enabled", there's the issue of "can Istio be enabled".
AFAIK there's no way to force containers to run sequentially in a Pod unless they're init containers, and init containers also have to run to completion before the next init container, or any "real" container.
This obviously clashes with istio-proxy, which is just a regular sidecar injected into the Pod.

Seems to me like the only real ways to get around this is:

Hopefully I'm missing some other magic solution - maybe some of the Istio people can weigh in. @rshriram maybe?

mattmoor commented 6 years ago

(haven't caught up on this thread, just chiming in with historical context)

/unassign @mattmoor /unassign @ian-mi /assign @tcnghia

@adamglt The history here is that when we first started enabling sidecars it blocked all egress traffic, which is a problem when the first thing you do is a git clone https://github.com/.... Given this limitation of Istio, this is likely the right setting for 99+% of builds.

That said, I don't want serving to need to know about these annotations, let alone build. If Istio had a sensible default mode that let it inject sidecars without being disruptive to the software its instrumenting, then we wouldn't need this.

If there is such a setting, then I'd love to know about it. I am by no means an expert on Istio, but perhaps you could join our Networking WG (led by @tcnghia, who is!) on Thursday mornings to discuss?

tcnghia commented 6 years ago

It looks like with this change https://github.com/istio/istio/pull/6406 in Istio we may be able to remove the annotations relating to sidecar injection in our Pods. Instead we should leave all injection policy choices to the users to fit into their use case.

That will also mean that we need to document how these can be applied and caveats (like configuring egress exceptions) so that users can make the best choice for themselves.

adamglt commented 6 years ago

@mattmoor hey, thanks for jumping in.

About istio's auto-injection, I think Istio just ignores namespaces without the istio-injection=enabled label - so that should be good enough (but only if people are running builds in build-dedicated namespaces).

About the git clones failing, yeah that makes sense. I mean, in our case we'd like them to fail (we'd whitelist specific endpoints), but that's not really a "sane default".

Anyway, as I wrote above I just realized builds were never really running or using the Istio sidecar, since it's a regular container, and all build steps are init-containers.
What I believe you've experienced is having Istio inject istio-init - the init-container that sets up the iptables rules forwarding all traffic to istio-proxy (the sidecar), but the sidecar was never there, so every call should have probably failed.

In any case, this is a pretty tricky situation. The whole model of sidecars vs init-containers is clashing pretty hard - so one of [knative, istio, k8s] has to make a pretty big change to make it all fit together.