kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.84k stars 2.66k forks source link

prow: include namespace in build cluster spec #8848

Closed stevekuznetsov closed 5 years ago

stevekuznetsov commented 6 years ago

We should have no problems scheduling to separate namespaces in the same cluster; k8s RBAC and other namespace isolation measures mean that as long as we are not creating cluster-scoped resources for tests (which we are not) we should be able to consider two namespaces on one cluster as separate clusters.

The plank controller currently loads the cluster configuration as a flag:

https://github.com/kubernetes/test-infra/blob/bc41683931b30662b4774d0cc0b18495ddf730f5/prow/cluster/plank_deployment.yaml#L34

That cluster file has this format:

https://github.com/kubernetes/test-infra/blob/bc41683931b30662b4774d0cc0b18495ddf730f5/prow/kube/client.go#L309-L324

However, that does not include the namespace; we just use one namespace for everything, which is in the main Prow configuration:

https://github.com/kubernetes/test-infra/blob/bc41683931b30662b4774d0cc0b18495ddf730f5/prow/cmd/plank/main.go#L140

/area prow /kind feature /help

k8s-ci-robot commented 6 years ago

@stevekuznetsov: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes/test-infra/issues/8848): >We should have no problems scheduling to separate namespaces in the same cluster; k8s RBAC and other namespace isolation measures mean that as long as we are not creating cluster-scoped resources for tests (which we are not) we should be able to consider two namespaces on one cluster as separate clusters. > >The `plank` controller currently loads the cluster configuration as a flag: > >https://github.com/kubernetes/test-infra/blob/bc41683931b30662b4774d0cc0b18495ddf730f5/prow/cluster/plank_deployment.yaml#L34 > >That cluster file has this format: > >https://github.com/kubernetes/test-infra/blob/bc41683931b30662b4774d0cc0b18495ddf730f5/prow/kube/client.go#L309-L324 > >However, that does not include the namespace; we just use one namespace for everything, which is in the main Prow configuration: > >https://github.com/kubernetes/test-infra/blob/bc41683931b30662b4774d0cc0b18495ddf730f5/prow/cmd/plank/main.go#L140 > >/area prow >/kind feature >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
stevekuznetsov commented 6 years ago

/cc @fejta @sipian

krasi-georgiev commented 6 years ago

Thanks @sipian and I will try to see if we can make this happen.

krasi-georgiev commented 6 years ago

If you can give us some pointers would be appreciated.

BenTheElder commented 6 years ago

as long as we are not creating cluster-scoped resources for tests (which we are not) we should be able to consider two namespaces on one cluster as separate clusters.

For what purpose though? We're currently just using clusters as execution / worker pools, the main reason to have seperate pools is strong isolation of secrets and compute resources.

As long as we don't restrict the pod-spec definition test-pods are not really isolated.. if there's no cluster scoped resources then what is the motivation?

sipian commented 6 years ago

For our automated-prometheus-benchmarking-tool, we have many namespaces running different Prometheus versions. We want to create a prowjob that runs a pod in a particular namespace. The pod will in-turn expose another end-point to Prometheus to scrape metrics from.

BenTheElder commented 6 years ago

SGTM

krasi-georgiev commented 6 years ago

in our case both jobs run in parallel and need to be isolated from each other

stevekuznetsov commented 6 years ago

strong isolation of secrets

@BenTheElder with RBAC, namespacing does give us strong isolation

stevekuznetsov commented 6 years ago

(nobody is stopping a cluster admin adding a serviceaccount that can read secrets at the cluster scope to the namespace that the pod is scheduled to, but by default there are no holes)

BenTheElder commented 6 years ago

@BenTheElder with RBAC, namespacing does give us strong isolation

not with docker as the runtime at least, especially with privileged containers. RBAC / namespacing alone doesn't give strong isolation for prowjobs. (which is a big reason why we implemented multiple cluster support to begin with)

stevekuznetsov commented 6 years ago

Oh right y'all use privileged containers. Well yeah with those everything is wonky

BenTheElder commented 6 years ago

even without privileged, the container isolation isn't exactly bulletproof

On Thu, Jul 26, 2018 at 5:30 PM Steve Kuznetsov notifications@github.com wrote:

Oh right y'all use privileged containers. Well yeah with those everything is wonky

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/test-infra/issues/8848#issuecomment-408275413, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4Bq05M2vYNoOMfMV2llHRsaUuVnfQoks5uKl83gaJpZM4ViVn4 .

BenTheElder commented 6 years ago

and even if you don't use privileged currently, someone can write a job that does use it, or one that has fun with hostpaths, or... Kubernetes multi-tenant still needs work.

On Thu, Jul 26, 2018 at 5:42 PM Benjamin Elder bentheelder@google.com wrote:

even without privileged, the container isolation isn't exactly bulletproof

On Thu, Jul 26, 2018 at 5:30 PM Steve Kuznetsov notifications@github.com wrote:

Oh right y'all use privileged containers. Well yeah with those everything is wonky

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/test-infra/issues/8848#issuecomment-408275413, or mute the thread https://github.com/notifications/unsubscribe-auth/AA4Bq05M2vYNoOMfMV2llHRsaUuVnfQoks5uKl83gaJpZM4ViVn4 .

krasi-georgiev commented 6 years ago

@sipian can you give a bit more details for our use case as it seems different than what is discussed so far.

stevekuznetsov commented 6 years ago

@BenTheElder right but those jobs would need to be approved. IIRC HostPath requires a specific SCC, etc. I am fairly certain at this point all of the multi-tenant stuff maybe except for default policy and admission control has been upstreamed from OpenShift to Kubernetes and multi-tenant is pretty bulletproof AFAIK. How is container isolation leaky?

0xmichalis commented 6 years ago

This has nothing to do with container runtime isolation, we need to take advantage of this combined with cluster rbac for other reasons (different teams owning different namespaces in our case).

BenTheElder commented 6 years ago

and multi-tenant is pretty bulletproof AFAIK. How is container isolation leaky?

This is perhaps not the best place for a detailed discussion on that, but the common container runtimes and current Kubernetes aren't really enough for hard multi-tenancy (soft sure, but not running arbitrary code and podspecs). For more see eg 1, 2. Just the sharing a kernel part is problematic because the surface area is large and not everything is namespaced. People are working on this, but Kubernetes today does not quite solve it.

This has nothing to do with container runtime isolation, we need to take advantage of this combined with cluster rbac for other reasons (different teams owning different namespaces in our case).

OK sure, but the initial multiple cluster support did have to do with this, specifically, and I'd discourage (but not block...) trying to do the same thing within a cluster. It's not suitable for strong isolation.

stevekuznetsov commented 6 years ago

I think using namespaces to isolate and assuming there are no kernel or Kubernetes CVEs active that would root your cluster is valid ... isolation using Kubernetes namespaces gives you a huge amount of benefit as it means for two users to see each other's objects they go from needing to add kubectl get to their test script to needing to find and exploit a CVE. Kernel CVEs will always exist but we are not handling secrets so sensitive we need to be constantly worried about it ...

BenTheElder commented 6 years ago

assuming there are no kernel or Kubernetes CVEs active that would root your cluster is valid

There were ~453 kernel CVEs last year.. (And there were CVEs for kube)

it means for two users to see each other's objects

The initial post suggested that we did not have other resources in-cluster. I'd also suggest that tests really ought to not manage other resources in cluster.

but we are not handling secrets so sensitive we need to be constantly worried about it ...

That's not always true though, and I'd imagine many prow users would have secrets they'd rather not share between projects. The multiple cluster support actually came about for testing Kubernetes CVE patches...

All I'm saying is, that sounds like a useful feature, but it's not strong enough to isolate projects with any sensitive secrets, and we should be careful about recommending usage of such a feature over multiple clusters.

0xmichalis commented 6 years ago

All I'm saying is, that sounds like a useful feature, but it's not strong enough to isolate projects with any sensitive secrets, and we should be careful about recommending usage of such a feature over multiple clusters.

Assuming the need for a very high level of security, users are not going to store secrets in etcd and likely are not going to use kubernetes at all for such workloads. As far as prow is concerned, if you want to avoid arbitrary code execution taking advantage of a priviledge escalation bug in the kernel or really any issue that could compromise security, you should require /ok-to-test on every test run, something that is not the case today in prow. This feature doesn't take away any of the existing security really.

sipian commented 6 years ago

Our use case includes creating ProwJobs in different namespaces. We won't be facing issues of privileged containers and security. But points discussed so far are also intriguing

BenTheElder commented 6 years ago

Assuming the need for a very high level of security, users are not going to store secrets in etcd and likely are not going to use kubernetes at all for such workloads. As far as prow is concerned, if you want to avoid arbitrary code execution taking advantage of a priviledge escalation bug in the kernel or really any issue that could compromise security, you should require /ok-to-test on every test run, something that is not the case today in prow.

This is not quite true when you use seperate build clusters, because you can also restrict who is allowed to PR /ok-to-test. By seperating projects at worst a member / contributor of project A gets secrets only from project A's cluster. Versus with namespaces in the same cluster potentially they exploit all of the projects on that cluster. With different clusters you've actually restricted the access for tests in that project to that project's resources.

This feature doesn't take away any of the existing security really.

The feature doesn't, mis-using it would. I'm saying we should document this appropriately, since once we create this feature users will use it for some form isolation, and should think carefully about what level they need. EG some users won't even need a separate cluster from the prow services if they use namespaces and RBAC appropriately. I doubt we'd ever do this ourselves though.

0xmichalis commented 6 years ago

This is not quite true when you use seperate build clusters, because you can also restrict who is allowed to PR /ok-to-test. By seperating projects at worst a member / contributor of project A gets secrets only from project A's cluster. Versus with namespaces in the same cluster potentially they exploit all of the projects on that cluster. With different clusters you've actually restricted the access for tests in that project to that project's resources.

Even with separate build clusters a contributor from project A can still open a PR in project B that fixes a legit issue, get an /ok-to-test, and subsequently change his PR with the exploit and push.

0xmichalis commented 6 years ago

Even with separate build clusters a contributor from project A can still open a PR in project B that fixes a legit issue, get an /ok-to-test, and subsequently change his PR with the exploit and push.

Where I am getting at is that if all it takes to compromise the same data is one more PR, then the build cluster separation offers a false sense of security. Also the prow config is centralized so admins should do their due dilligence when managing a cluster (or multiple clusters) with different kinds of secrets.

BenTheElder commented 6 years ago

Even with separate build clusters a contributor from project A can still open a PR in project B that fixes a legit issue, get an /ok-to-test, and subsequently change his PR with the exploit and push.

I'm aware of this and consider it to be an open bug worth revisiting at some point. Suffice to say, we trust those who even have access to create a PR in some cases, so this point doesn't always matter. /ok-to-test behavior is definitely a footgun currently. We should document this as well.

Where I am getting at is that if all it takes to compromise the same data is one more PR, then the build cluster separation offers a false sense of security.

Not with a different set of trusted users. (/ok-to-test misuse aside).

Where I am getting at is that if all it takes to compromise the same data is one more PR, then the build cluster separation offers a false sense of security.

Again, only if you can't trust those with /ok-to-test / trusted PR powers.

Also the prow config is centralized so admins should do their due dilligence when managing a cluster (or multiple clusters) with different kinds of secrets.

Yes, which is something we should document. Noting this is the ask.

stevekuznetsov commented 6 years ago

If you can trust /ok-to-test and trusted PR authors, you could just operate on goodwill, no?

BenTheElder commented 6 years ago

If you can trust /ok-to-test and trusted PR authors, you could just operate on goodwill, no?

Within that project / cluster, but we also host other / larger communities on that Prow on another cluster under the same Prow services (which is an attractive option when we support executing in different locations). We do enforce which jobs run where as well.

edit: and that's still non-ideal. we should take another look at /ok-to-test at some point

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/test-infra/issues/8848#issuecomment-449779898): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.