kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
292 stars 421 forks source link

Azure Generation 2 VM support #1003

Closed craiglpeters closed 1 year ago

craiglpeters commented 3 years ago

/kind feature

Describe the solution you'd like CAPZ should enable me to create clusters with Azure Generation 2 VMs.

Anything else you would like to add: I can't think of a use case where mixed gen 1 and gen 2 VMs are needed

craiglpeters commented 3 years ago

This may require changes to packer

alexeldeib commented 3 years ago

https://github.com/kubernetes-sigs/image-builder/pull/422

also requires checking the VM size supports gen2

It would be nice if we could attempt to default gen2 over gen1 when possible similar to https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1012#discussion_r513845559 but doing so has all the current caveats about webhook vs. controller context.

CecileRobertMichon commented 3 years ago

/assign @alexeldeib

alexeldeib commented 3 years ago

@CecileRobertMichon what is left to close this? An example/docs maybe once we publish official gen2 images for the next k8s patch version?

nader-ziada commented 3 years ago

@CecileRobertMichon what is left to close this? An example/docs maybe once we publish official gen2 images for the next k8s patch version?

I can work on this work if you don't have time @alexeldeib

I think we still have to check if the machine size supports Gen2 and set that when creating the VM like you mentioned in the comment above

alexeldeib commented 3 years ago

oops, go for it! I finished the image builder changes but totally forgot we didn't already have the capability check.

/unassign

nader-ziada commented 3 years ago

thanks, will submit a PR soon

/assign

nader-ziada commented 3 years ago

All the images used in tests should work fine with Gen2, except for the GPU one, we are using Standard_NV6, but could probably switch to Standard_NV12s_v3 (will create a PR to test that)

https://docs.microsoft.com/en-us/azure/virtual-machines/generation-2

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1003#issuecomment-1079961483): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
shysank commented 2 years ago

/remove-lifecycle rotten

mboersma commented 2 years ago

We do need to switch to Standard_NV12s_v3 or similar for GPU tests, not only to complete this issue, but because Standard_NV6 goes away in a year:

Based on feedback we've received from customers we're happy to announce we are extending the retirement date by 1 year to 31 August 2023 for the Azure NV6, NV6_Promo, NV12, NV12_Promo, NV24, NV24_Promo virtual machines to give you more time to plan your migration.

The practical issue is getting any sort of quota for the newer SKU types in the subscription that runs CI. So far we've not had any luck doing that.

nader-ziada commented 2 years ago

/unassign

invidian commented 2 years ago

Perhaps it would make sense to close this issue and create a new one for migrating GPU tests?

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1003#issuecomment-1344403205): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
CecileRobertMichon commented 1 year ago

/reopen /remove-lifecycle rotten

@mboersma @willie-yao are we currently publishing gen 2 images or still gen 1? If not is the only remaining item to switch the GPU SKU?

k8s-ci-robot commented 1 year ago

@CecileRobertMichon: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1003#issuecomment-1344596142): >/reopen >/remove-lifecycle rotten > >@mboersma @willie-yao are we currently publishing gen 2 images or still gen 1? If not is the only remaining item to switch the GPU SKU? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
mboersma commented 1 year ago

@CecileRobertMichon we are not yet publishing Gen2 images, but IDK specifically what the hitch is (if there is any). We would need to make small changes to the image-builder scripts, since -gen1 is effectively hard-coded right now.

mboersma commented 1 year ago

Re-reading this thread, it would appear that switching GPU SKUs is the only known blocker, besides adjusting the publishing scripts.

mboersma commented 1 year ago

/assign

mboersma commented 1 year ago

/close

This issue as stated is incorrect–gen2 Azure images will work fine in CAPZ, although we currently publish reference images only in gen1 format.

k8s-ci-robot commented 1 year ago

@mboersma: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1003#issuecomment-1419543484): >/close > >This issue as stated is incorrect–gen2 Azure images will work fine in CAPZ, although we currently publish reference images only in gen1 format. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.