kubernetes-sigs / cluster-api-provider-kubevirt

Cluster API Provider for KubeVirt
Apache License 2.0
110 stars 63 forks source link

Reconciler error status.ready: Required value #252

Closed maxheyer closed 1 year ago

maxheyer commented 1 year ago

What steps did you take and what happened: I just followed the quick start guide to set up cluster-api with the infrastructure provider kubevirt.

After trying to apply the testcluster config, the capk-controller-manager pod gives the following errors:

E0808 18:39:39.001926       1 controller.go:317] controller/kubevirtcluster "msg"="Reconciler error" "error"="KubevirtCluster.infrastructure.cluster.x-k8s.io \"testcluster\" is invalid: status.ready: Required value" "name"="testcluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtCluster" 
I0808 18:39:39.549669       1 kubevirtmachine_controller.go:193] controller/kubevirtmachine/default/testcluster-md-0-tfgvh "msg"="Waiting for the control plane to be initialized..." "name"="testcluster-md-0-tfgvh" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 
E0808 18:39:39.549712       1 kubevirtmachine_controller.go:352] controller/kubevirtmachine/default/testcluster-md-0-tfgvh "msg"="Workload cluster client is not available" "error"="failed to get kubeconfig for workload cluster: failed to fetch kubeconfig for workload cluster: Secret \"testcluster-kubeconfig\" not found" "name"="testcluster-md-0-tfgvh" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 
I0808 18:39:39.549725       1 kubevirtmachine_controller.go:355] controller/kubevirtmachine/default/testcluster-md-0-tfgvh "msg"="Waiting for workload cluster client..." "name"="testcluster-md-0-tfgvh" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 
I0808 18:39:39.560760       1 kubevirtmachine_controller.go:193] controller/kubevirtmachine/default/testcluster-md-0-tfgvh "msg"="Waiting for the control plane to be initialized..." "name"="testcluster-md-0-tfgvh" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="KubevirtMachine" 

What did you expect to happen: I expected to see a test cluster running.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] I also tried to use clusterapi-operator to make clusterapi work. But I ran into the same problem.

My clusterctl command to generate the testcluster config:

clusterctl generate cluster testcluster \
  --infrastructure="kubevirt:v0.1.7" \
  --kubernetes-version v1.23.10 \
  --control-plane-machine-count=1 \
  --worker-machine-count=1 \
  > testcluster.yaml

Environment:

/kind bug

prometherion commented 1 year ago

I noticed the same error, although I was able to create the cluster, so definitely your missing nodes are not related to that issue.

When a KubeVirtCluster several updates are occurring, such as:

  1. adding the finalizer
  2. updating the status upon the infrastructure components creation

Just for the sake of context, the CAPI PatchHelper is used, which simplifies the diff calculation of the resources: tl;dr; it's a helper with really good syntactic sugar to update partial objects in a simpler way.

When performing the first update the Patch Helper (on CAPI@v1.0.0) performs the following actions:

    // Issue patches and return errors in an aggregate.
    return kerrors.NewAggregate([]error{
        // Patch the conditions first.
        //
        // Given that we pass in metadata.resourceVersion to perform a 3-way-merge conflict resolution,
        // patching conditions first avoids an extra loop if spec or status patch succeeds first
        // given that causes the resourceVersion to mutate.
        h.patchStatusConditions(ctx, obj, options.ForceOverwriteConditions, options.OwnedConditions),

        // Then proceed to patch the rest of the object.
        h.patch(ctx, obj),
        h.patchStatus(ctx, obj),
    })

The patchStatusConditions is the reason for getting your error ("KubevirtCluster.infrastructure.cluster.x-k8s.io \"testcluster\" is invalid: status.ready: Required value") and the issue is due to the markers for the /KubeVirtCluster/status/ready field:

https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/blob/b3c2554752b26307004fd1dfefa6173ce64e9c45/api/v1alpha1/kubevirtcluster_types.go#L70-L72

The status is marked as a required one, and furthermore, it's missing a default value.

A possible workaround could be marking it as an optional one, although IMHO the status should be always reported, nevertheless the initial conditions.

I'll open a MR to address this issue so we can discuss it with the developers on how to overcome this annoying and fake error.