kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.82k stars 2.62k forks source link

Nested Virtualization Support for prow.k8s.io #13341

Closed moshloop closed 4 years ago

moshloop commented 5 years ago

A number of different projects would benefit from nested virtualization (NV) support:

GCP does support nested virtualization so this is technically feasible.

The real question is if this is practically feasible, I see 3 potential options:

1) NV becomes a first-class citizen that any prow job can request - I think this would require a new GCP cluster managed by something like kops, (Unless GKE supports custom licensed images?) 2) NV becomes a second-class citizen on an external CI system like AppVeyor that supports NV already, managed and owned by test-infra 3) NV is unsupported and projects would need to manage and own it themselves

/cc @timothysc @justinsb

timothysc commented 5 years ago

@spiffxp @BenTheElder ?

BenTheElder commented 5 years ago

I don't run the clusters currently so I can't speak to that with any authority, but previously the response was only to use GKE because of time saved to the team running Prow.

This is in part why we have kind. @munnerz also used to run a Prow with nested virtualization for Jetstack and switched off of this because it had a lot of problems (eg you can leak resources easily).

Generally test-infra has preferred to keep things ~containerized for the reliable cleanup of resources (provided by Kubernetes of course...).

BenTheElder commented 5 years ago

When a job really does need a VM so far test-infra has recommended spinning up a GCE or AWS VM from the ProwJob as the project has separate ~reliable infrastructure to lease projects and nuke everything in them after the job is done. (see boskos/).

stevekuznetsov commented 5 years ago

Running nested virt using credentials to set it up in separate cloud computing infrastructure from the service/build clusters that Prow schedules to is possible today. Not sure this requires any changes to prow, even if the request is for nested virt on the build cluster itself.

moshloop commented 5 years ago

So this request would be for a new build cluster with nested virtualization - I think spinning up new VM's for each job kind of defeats the purpose of nested virtualization in the first place. Could we not create new boskos resource for this ?

BenTheElder commented 5 years ago

So this request would be for a new build cluster with nested virtualization

You'll need to get the prow.k8s.io oncall team to agree to this, or provide your own. I would bring this up in #testing-ops on the kubernetes slack.

I think spinning up new VM's for each job kind of defeats the purpose of nested virtualization in the first place.

it of course depends on the use case. it might for yours, but it would still work with what is available today.

Could we not create new boskos resource for this ?

You'd need to come up with a pattern to manage this. boskos has a "janitor" for GCP projects / AWS regions /... today that knows to simply list and delete any sub-resource when the project / region / ... is done being used. With nested VMs that's going to be trickier unless you do something like only one VM to a machine which ... that's not much better than spinning up a VM remotely.

moshloop commented 5 years ago

get the prow.k8s.io oncall team to agree to this

This doesn't need the same level of support as core prow jobs, I would be able to provide oncall with a 1 - 2 business day response time.

only one VM to a machine

One Nested Virtualization Host (VM) per job and one host per GCP project would be fine as it would meet the primary goals:

a) Reduce the amount of boiler-plate / bootstrapping required to run a job required NV b) Make running NV jobs fast by keeping a pool of pre-warmed vm's

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/test-infra/issues/13341#issuecomment-562994547): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
LorbusChris commented 4 years ago

I'm still interested in this. /reopen

k8s-ci-robot commented 4 years ago

@LorbusChris: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes/test-infra/issues/13341#issuecomment-572143300): >I'm still interested in this. >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
BenTheElder commented 4 years ago

We're not going to do this on the default build nodes (it's not possible in GKE AND we don't want to deal with leaked VMs), but you could spin up a GCE vm with nested virt from a boskos project in your prowjobs and do whatever you want there.

On Wed, Jan 8, 2020, 08:17 Kubernetes Prow Robot notifications@github.com wrote:

@LorbusChris https://github.com/LorbusChris: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this https://github.com/kubernetes/test-infra/issues/13341#issuecomment-572143300 :

I'm still interested in this. /reopen

Instructions for interacting with me using PR comments are available here https://git.k8s.io/community/contributors/guide/pull-requests.md. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue: repository.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/test-infra/issues/13341?email_source=notifications&email_token=AAHADK3UJKEWGKZIKVJV6P3Q4X4B7A5CNFSM4H6U6642YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEINDN6I#issuecomment-572143353, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHADK2CA2XKH5JAEBWVPSLQ4X4B7ANCNFSM4H6U664Q .