wave-k8s / wave

Kubernetes configuration tracking controller
Apache License 2.0
646 stars 82 forks source link

go1.22 and k8s1.29 support #145

Closed Dentrax closed 2 months ago

Dentrax commented 3 months ago

CARRY: #105 and #107 Ref: #138

This is an experimental PR to add some additional works to @mamccorm's work: https://github.com/wave-k8s/wave/issues/138#issuecomment-2014217942

make test fails with some errors that I couldn't understand:

DaemonSet controller Suite When a DaemonSet is reconciled And it has the required annotation
  Adds a finalizer to the DaemonSet
  /Users/furkan/src/wave/pkg/controller/daemonset/daemonset_controller_test.go:194
E0322 17:13:06.038082   83906 controller.go:329] "msg"="Reconciler error" "error"="error updating instance default/example: no kind is registered for the type core.daemonset in scheme \"pkg/runtime/scheme.go:100\"" "controller"="daemonset-controller" "object"={"name"="example" "namespace"="default"} "namespace"="default" "name"="example" "reconcileID"="8ca16401-15a8-48b7-96f9-a2838acbc8d3"
W0322 17:13:41.053875   83906 reflector.go:462] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
W0322 17:13:41.054024   83906 reflector.go:462] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
W0322 17:13:41.054255   83906 reflector.go:462] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: watch of *v1.DaemonSet ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
W0322 17:14:16.232143   83906 reflector.go:462] pkg/mod/k8s.io/client-go@v0.29.0/tools/cache/reflector.go:229: watch of *v1.ConfigMap ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
E0322 17:13:41.216844   83906 controller.go:329] "msg"="Reconciler error" "error"="error updating instance default/example: no kind is registered for the type core.daemonset in scheme \"pkg/runtime/scheme.go:100\"" "controller"="daemonset-controller" "object"={"name"="example" "namespace"="default"} "namespace"="default" "name"="example" "reconcileID"="7f645167-04a3-43ce-9d04-f583049b04e1"

So I'm stumped on this. Waiting your feedback on this.

/cc @starkers @toelke

msvticket commented 2 months ago

Thanks for trying to get wave up to date with k8s!

E0322 17:13:41.216844 83906 controller.go:329] "msg"="Reconciler error" "error"="error updating instance default/example: no kind is registered for the type core.daemonset in scheme \"pkg/runtime/scheme.go:100\"" "controller"="daemonset-controller" "object"={"name"="example" "namespace"="default"} "namespace"="default" "name"="example" "reconcileID"="7f645167-04a3-43ce-9d04-f583049b04e1"

This error I have figured out:

The call to update the k8s resource is made here https://github.com/Dentrax/wave/blob/go1.22-k8s1.29/pkg/core/handler.go#L110. The problem is that the variable copy isn't of a recognized type, since it's of the type core.daemonset, which is internal to wave. It has a "real" DaemonSet as a member though.

So one way of getting rid of that error is with this patch:

Index: pkg/core/handler.go
===================================================================
diff --git a/pkg/core/handler.go b/pkg/core/handler.go
--- a/pkg/core/handler.go   (revision 2fd44cfd8e8ceaf09d91bd9d8230c22707cc6b11)
+++ b/pkg/core/handler.go   (date 1713436150304)
@@ -107,7 +107,7 @@
    if !reflect.DeepEqual(instance, copy) {
        log.V(0).Info("Updating instance hash", "namespace", instance.GetNamespace(), "name", instance.GetName(), "hash", hash)
        h.recorder.Eventf(copy, corev1.EventTypeNormal, "ConfigChanged", "Configuration hash updated to %s", hash)
-       err := h.Update(context.TODO(), copy)
+       err := h.Update(context.TODO(), copy.GetObject())
        if err != nil {
            return reconcile.Result{}, fmt.Errorf("error updating instance %s/%s: %v", instance.GetNamespace(), instance.GetName(), err)
        }
Index: pkg/core/types.go
===================================================================
diff --git a/pkg/core/types.go b/pkg/core/types.go
--- a/pkg/core/types.go (revision 2fd44cfd8e8ceaf09d91bd9d8230c22707cc6b11)
+++ b/pkg/core/types.go (date 1713439124160)
@@ -45,6 +45,7 @@
 type podController interface {
    client.Object
    metav1.Object
+   GetObject() client.Object
    GetPodTemplate() *corev1.PodTemplateSpec
    SetPodTemplate(*corev1.PodTemplateSpec)
    DeepCopyPodController() podController
@@ -55,6 +56,10 @@
    *appsv1.Deployment
 }

+func (d *deployment) GetObject() client.Object {
+   return d.Deployment
+}
+
 func (d *deployment) GetPodTemplate() *corev1.PodTemplateSpec {
    return &d.Spec.Template
 }
@@ -72,6 +77,10 @@
    *appsv1.StatefulSet
 }

+func (s *statefulset) GetObject() client.Object {
+   return s.StatefulSet
+}
+
 func (s *statefulset) GetPodTemplate() *corev1.PodTemplateSpec {
    return &s.Spec.Template
 }
@@ -89,6 +98,10 @@
    *appsv1.DaemonSet
 }

+func (d *daemonset) GetObject() client.Object {
+   return d.DaemonSet
+}
+
 func (d *daemonset) GetPodTemplate() *corev1.PodTemplateSpec {
    return &d.Spec.Template
 }

Unfortunately that doesn't make any more tests succeed...

toelke commented 2 months ago

Cool. With that change, the e2e tests complete (see #147).

The errors from make test are weird, and the errors the GitHub action gives are even weirder. I'll keep looking.

toelke commented 2 months ago

Now the tests are running, but into timeout issues. I will pick this up tomorrow (if nobody else has any bright ideas :-D ).

toelke commented 2 months ago

OK, this seems to be (at least) an API mismatch between kubebuilder v1 and v2; migrating 1->2 or even 2->3 seems to be best done by doing a complete re-creation of the project; I am inclined to try that.

Dentrax commented 2 months ago

Thank you for your efforts here @toelke!

I just checkout your branch and tried to increase timeouts to 60s but tests are still failing, resulting lots of context deadline exceeded.

toelke commented 2 months ago

Yeah, looking at the logs, it appears that the controller started for the tests does not, in fact, listen for changes to Deployments. Luckily, the "real" controller deployed for the e2e tests does...

jabdoa2 commented 2 months ago

I will create a PR against your branch later which should fix a few things:

  1. Simplify Makefile setup (more like modern kubebuilder projects with everything in $PWD/bin/ instead of system wide installation)
  2. Use env-test instead of deprecated controller-tests
  3. Fix test harness
  4. Fix/Workaround reflection issues in apimachinery. That is the part causing the schema errors. Its a bit complicated. Basically, wave uses custom objects for kubernetes api objects and tries to save those. That incorrectly makes apimachinery believe that the object is in core instead of apps. This does not seem to be a test issue but a general issue.

In later PRs we should also update some other dependencies but one step after another.

jabdoa2 commented 2 months ago

Here is the PR: https://github.com/Dentrax/wave/pull/1

jabdoa2 commented 2 months ago

All ready from my side. Let me know if you or @toelke want any other change. This should enable us to build a release and close a lot of issues caused by the old kubernetes client api version (topology-spread, startup-probes etc). We are very eager to test and deploy this.

Dentrax commented 2 months ago

Thank you so much for your great work here! @jabdoa2

@toelke waiting a maintainer approval to run pipeline.

jabdoa2 commented 2 months ago

I can create another PR for the linter timeout later. Guess my local github action runner has just been much faster.

Dentrax commented 2 months ago

Fixed CI issue by increasing golangci-lint timeout + used v1.57. cc @jabdoa2

jabdoa2 commented 2 months ago

Judging from the CI Run on the PR against your repo I might have messed up the indent in this line: https://github.com/Dentrax/wave/pull/1/files#diff-faff1af3d8ff408964a57b2e475f69a6b7c7b71c9978cccc8f471798caac2c88R20. Maybe my local CI runner just ignored that setting?

See: https://github.com/Dentrax/wave/actions/runs/8823091364

toelke commented 2 months ago

Something else seems to be wrong... https://github.com/wave-k8s/wave/actions/runs/8837527207/job/24266584757?pr=145

jabdoa2 commented 2 months ago

Thanks for fixing the indent.

Something else seems to be wrong... https://github.com/wave-k8s/wave/actions/runs/8837527207/job/24266584757?pr=145

Looks like one of the tests is flaky. They historically have been flaky so this probably not new. I will take care to make it more robust nonetheless. At least all other tests passed :-).

jabdoa2 commented 2 months ago

This should address the particular case: https://github.com/Dentrax/wave/pull/2. Unfortunately, there are a few more but I have actually seen this case quite a few times so it should go first. One step after the other.

Tests also passed: https://github.com/Dentrax/wave/actions/runs/8840886549/job/24277018560?pr=2

Dentrax commented 2 months ago

Thank you! It's green again. Awaiting maintainer approval to run the CI.

toelke commented 2 months ago

image

:partying_face:

Dentrax commented 2 months ago

Thank you so much folks for great assistance here! And merging this PR! 🚀

mamccorm commented 2 months ago

Great work all, amazing to see the collab