kubernetes-sigs / cluster-api-provider-kubevirt

Cluster API Provider for KubeVirt
Apache License 2.0
108 stars 61 forks source link

Unable to specific cloudInitNoCloud volumes, currently hardcoded to configInitCloudDrive #200

Closed hh closed 5 months ago

hh commented 1 year ago

What steps did you take and what happened:

Tried to create a kubevirt capi cluster using cloud-init cloudInitNoCloud data source.

What did you expect to happen:

Ability to specify cloud-init cloudInitNoCloud instead of the hard coded cloudInitConfigDrive data source.

Notes:

Both cloudInitConfigDrive and cloudInitNoCloud are both supported as VolumeSources by kubevirt in v1.Volume.

From cloud-init/cloud-init.go#ReadCloudInitVolumeDataSource() ::

ReadCloudInitVolumeDataSource scans the given VMI for CloudInit volumes and reads their content into a CloudInitData struct. Does not resolve secret refs.

If a cloudInitNoCloud is found in the volumeSources list, NoCloud is configured, and cloudInitConfigDrive is ignored.

Our NoCloud Implementation Details state:

Internally, kubevirt passes the cloud-init spec to the config-disk package. That package generates the iso files associated with the nocloud datasource.

When the VMI starts, virt-handler injects the NoCloud iso as a file based disk that the VMI consumes.

From there the NoCloud datasource process internal to the VMI detects the attached disk and processes the userdata > and metadata stored on the disk.

However, in cluster-api-provider-kubvirt we hardcode the the volumeSource to be cloudInitConfigDrive when we build the VirtualMachineInstanceTemplateSpec within buildVirtualMachineInstanceTemplate().

pkg/kubevirt/utils.go#buildVirtualMachineInstanceTemplate()

cloudInitVolume := kubevirtv1.Volume{
 Name: cloudInitVolumeName,
 VolumeSource: kubevirtv1.VolumeSource{
  ### HARDCODED TO cloud-init here:###
  CloudInitConfigDrive: &kubevirtv1.CloudInitConfigDriveSource{
  ### We need to be able to use other Volumes, specifically no-cloud ###
  UserDataSecretRef: &corev1.LocalObjectReference{
    Name: *ctx.Machine.Spec.Bootstrap.DataSecretName + "-userdata",
  }, }, },...

This approach means it is not currently possible to use VM Images (like talos) that expect no cloud based ISOs to be attached to the VM for userData and networkData.

Suggestions as to the approach welcome, we attempted to update the KubeVirtVirtualMachineTemplate to use a secret, but it's dynamically named based on the VM. The better approach is to be able to specific the type of init volume (and likley the sub attributes of that type), like the additional network data required by no cloud.

davidvossel commented 1 year ago

To clarify, you're just requesting to have the option to use NoCloud instead of ConfigDrive?

If that's the case, we can simply allow users to create an empty NoCloud or ConfigDrive volume in the KubeVirtMachine object, and we'll use that volume for the cloud init (our capk controller would fill the rest of the details for the volume) instead of autogenerating the ConfigDrive one like we do today.

Would that work for you?

hh commented 1 year ago

I think that would work @davidvossel !

davidvossel commented 1 year ago

@hh Hey, i saw you put a note in the community meeting notes. We're on a bi-weekly cadence right now and that was are off week.

Your question about "how can we help"? Is this code you'd be interested in contributing? if so we can help offer guidance once a PR is posted.

hh commented 1 year ago

I think a pointer or two into which code needs updating and/or specific docs that would help us start a PR would be great.

davidvossel commented 1 year ago

I think a pointer or two into which code needs updating and/or specific docs that would help us start a PR would be great.

yeah, absolutely!

starting here, https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/blob/6a77a49da11bce48c0c1739f76a77a32c6acb48d/pkg/kubevirt/utils.go#L141

If a volume already exists called "cloudinitvolume" in the KubeVirtMachine's spec, then you can make logic that uses that volume to inject the userdata secret rather than the capk controller automatically generating the ConfigDrive one.

so, If the "cloudInitVolume" already exists, then our controller should just set the userdata field for either the ConfigDrive or NoCloud volume source, which ever is defined in the volume. Otherwise, the controller should create a the cloudinitvolume using ConfigDrive by default.

Hopefully that makes sense, let me know if i can clarify it more.

hh commented 1 year ago

Thanks heaps! Will look into this. /cc @BobyMCbobs

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/200#issuecomment-1510002331): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
bzub commented 7 months ago

The ability to use nocloud instead of configdrive would be very helpful. I see another PR didn't make it here: https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/pull/224

bzub commented 7 months ago

/reopen

k8s-ci-robot commented 7 months ago

@bzub: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/200#issuecomment-1843481639): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 5 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/200#issuecomment-1902067781): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
gadgetmg commented 5 months ago

For anyone who stumbles on this, specifically for trying to use the KubeVirt provider with Talos Linux, you can use the openstack image from their releases page in your KubevirtMachineTemplate.

The reason is that "Config drive" is the cloud-init datasource for OpenStack which is technically distinct from NoCloud.