kubernetes-sigs / controller-runtime

Repo for the controller-runtime subproject of kubebuilder (sig-apimachinery)
Apache License 2.0
2.4k stars 1.11k forks source link

Data race during manager shutdown #2760

Open ahmetb opened 3 months ago

ahmetb commented 3 months ago

I'd like to report a data race caught by go test -race during my end-to-end test suite (which uses envtest, but does not use Ginkgo) which ended up failing my test.

The data race has occurred during the shutdown sequence. It seems during this time, one of the reconcilers was doing an API call, which ended up getting cancelled due to ctx cancellation I do as part of the tear down (to shut down envtest and the Runnables).

I am not at all sure if it is related to controller-runtime, as the concurrent reader/writer goroutine stacks are pointing to net/http:

But here it goes.

2024-04-02T18:22:08Z    INFO    Tests finished  {"duration": "2.049504249s", "exitCode": 0}
2024-04-02T18:22:08Z    INFO    Shutting down controllers
2024-04-02T18:22:08Z    INFO    Shutting down envtest
2024-04-02T18:22:08Z    INFO    Stopping and waiting for non leader election runnables
2024-04-02T18:22:08Z    INFO    Stopping and waiting for leader election runnables
2024-04-02T18:22:08Z    INFO    Shutdown signal received, waiting for all workers to finish {"controller": "kubernetescontrolplane", "controllerGroup": "example.com", "controllerKind": "KubernetesControlPlane"}
2024-04-02T18:22:08Z    INFO    Shutdown signal received, waiting for all workers to finish {"controller": "etcdcluster", "controllerGroup": "example.com", "controllerKind": "EtcdCluster"}
2024-04-02T18:22:08Z    INFO    All workers finished    {"controller": "kubernetescontrolplane", "controllerGroup": "example.com", "controllerKind": "KubernetesControlPlane"}
==================
WARNING: DATA RACE
Read at 0x00c0001361c0 by goroutine 269:
  net/http.rewindBody()
      /export/home/tester/.go/1.22.0/src/net/http/transport.go:683 +0xcb
  net/http.(*Transport).roundTrip()
      /export/home/tester/.go/1.22.0/src/net/http/transport.go:557 +0x6db
  net/http.(*Transport).RoundTrip()
      /export/home/tester/.go/1.22.0/src/net/http/roundtrip.go:17 +0x33
  k8s.io/client-go/transport.(*userAgentRoundTripper).RoundTrip()
      /export/home/tester/go-athens/pkg/mod/k8s.io/client-go@v0.29.1/transport/round_trippers.go:168 +0x4f6
  net/http.send()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:259 +0x9a6
  net/http.(*Client).send()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:180 +0x14e
  net/http.(*Client).do()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:724 +0x11c8
  net/http.(*Client).Do()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:590 +0x6a6
  k8s.io/client-go/rest.(*Request).request()
      /export/home/tester/go-athens/pkg/mod/k8s.io/client-go@v0.29.1/rest/request.go:1023 +0x3ac
  k8s.io/client-go/rest.(*Request).Do()
      /export/home/tester/go-athens/pkg/mod/k8s.io/client-go@v0.29.1/rest/request.go:1063 +0x119
  sigs.k8s.io/controller-runtime/pkg/client.(*typedClient).PatchSubResource()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/client/typed_client.go:277 +0x924
  sigs.k8s.io/controller-runtime/pkg/client.(*subResourceClient).Patch()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/client/client.go:595 +0x484
  golnkd.in/k8s-compute-broker-controller/control-plane-manager/internal/controller.(*EtcdClusterReconciler).Reconcile()
      /export/content/data/multiproduct-post-commit/i001/workspace/k8s-compute-broker-controller_a60f2169246cc0c4f7d68a8124dca5ad12006477/control-plane-manager/internal/controller/etcdcluster_controller.go:81 +0x78d
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:119 +0x1a1
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:316 +0x59a
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266 +0x399
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227 +0xb2

Previous write at 0x00c0001361c0 by goroutine 304:
  net/http.(*readTrackingBody).Read()
      /export/home/tester/.go/1.22.0/src/net/http/transport.go:656 +0x44
  golang.org/x/net/http2.(*clientStream).writeRequestBody()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:1732 +0x5d9
  golang.org/x/net/http2.(*clientStream).writeRequest()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:1440 +0xf32
  golang.org/x/net/http2.(*clientStream).doRequest()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:1326 +0x33
  golang.org/x/net/http2.(*ClientConn).RoundTrip.gowrap1()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:1232 +0x44

Goroutine 269 (running) created at:
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:223 +0x80b
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:234 +0x2b9
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/manager/runnable_group.go:223 +0x1c6
  sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.gowrap1()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/manager/runnable_group.go:226 +0x41

Goroutine 304 (running) created at:
  golang.org/x/net/http2.(*ClientConn).RoundTrip()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:1232 +0x7d7
  golang.org/x/net/http2.(*Transport).RoundTripOpt()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:564 +0x644
  golang.org/x/net/http2.(*Transport).RoundTrip()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:513 +0x37
  golang.org/x/net/http2.noDialH2RoundTripper.RoundTrip()
      /export/home/tester/go-athens/pkg/mod/golang.org/x/net@v0.19.0/http2/transport.go:3136 +0x26
  net/http.(*Transport).roundTrip()
      /export/home/tester/.go/1.22.0/src/net/http/transport.go:553 +0x641
  net/http.(*Transport).RoundTrip()
      /export/home/tester/.go/1.22.0/src/net/http/roundtrip.go:17 +0x33
  k8s.io/client-go/transport.(*userAgentRoundTripper).RoundTrip()
      /export/home/tester/go-athens/pkg/mod/k8s.io/client-go@v0.29.1/transport/round_trippers.go:168 +0x4f6
  net/http.send()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:259 +0x9a6
  net/http.(*Client).send()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:180 +0x14e
  net/http.(*Client).do()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:724 +0x11c8
  net/http.(*Client).Do()
      /export/home/tester/.go/1.22.0/src/net/http/client.go:590 +0x6a6
  k8s.io/client-go/rest.(*Request).request()
      /export/home/tester/go-athens/pkg/mod/k8s.io/client-go@v0.29.1/rest/request.go:1023 +0x3ac
  k8s.io/client-go/rest.(*Request).Do()
      /export/home/tester/go-athens/pkg/mod/k8s.io/client-go@v0.29.1/rest/request.go:1063 +0x119
  sigs.k8s.io/controller-runtime/pkg/client.(*typedClient).PatchSubResource()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/client/typed_client.go:277 +0x924
  sigs.k8s.io/controller-runtime/pkg/client.(*subResourceClient).Patch()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/client/client.go:595 +0x484
  golnkd.in/k8s-compute-broker-controller/control-plane-manager/internal/controller.(*EtcdClusterReconciler).Reconcile()
      /export/content/data/multiproduct-post-commit/i001/workspace/k8s-compute-broker-controller_a60f2169246cc0c4f7d68a8124dca5ad12006477/control-plane-manager/internal/controller/etcdcluster_controller.go:81 +0x78d
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:119 +0x1a1
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:316 +0x59a
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266 +0x399
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
      /export/home/tester/go-athens/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227 +0xb2
==================
2024-04-02T18:22:08Z    DEBUG   Status updated  {"controller": "etcdcluster", "controllerGroup": "example.com", "controllerKind": "EtcdCluster", "EtcdCluster": {"name":"test"}, "namespace": "", "name": "test", "reconcileID": "c175233f-a6c2-46c4-82af-49e68f48bdbb"}
2024-04-02T18:22:08Z    ERROR   Reconciler error    {"controller": "etcdcluster", "controllerGroup": "example.com", "controllerKind": "EtcdCluster", "EtcdCluster": {"name":"test"}, "namespace": "", "name": "test", "reconcileID": "c175233f-a6c2-46c4-82af-49e68f48bdbb", "error": "failed to patch EtcdCluster status: Patch \"https://127.0.0.1:40187/apis/example.com/v1alpha1/etcdclusters/test/status\": context canceled", "errorCauses": [{"error": "failed to patch EtcdCluster status: Patch \"https://127.0.0.1:40187/apis/example.com/v1alpha1/etcdclusters/test/status\": context canceled"}]}
2024-04-02T18:22:08Z    INFO    All workers finished    {"controller": "etcdcluster", "controllerGroup": "example.com", "controllerKind": "EtcdCluster"}
2024-04-02T18:22:08Z    INFO    Stopping and waiting for caches
2024-04-02T18:22:08Z    INFO    Stopping and waiting for webhooks
2024-04-02T18:22:08Z    INFO    Stopping and waiting for HTTP servers
2024-04-02T18:22:08Z    INFO    controller-runtime.metrics  Shutting down metrics server with timeout of 1 minute
2024-04-02T18:22:08Z    INFO    Wait completed, proceeding to shutdown the manager
2024-04-02T18:22:08Z    INFO    Controllers stopped
2024-04-02T18:22:09Z    INFO    Teardown finished   {"duration": "1.107332537s"}
Found 1 data race(s)
vincepri commented 3 months ago

Maybe related to https://github.com/kubernetes-sigs/controller-runtime/pull/2757

Are you able to update to latest v0.17 commit?

vincepri commented 3 months ago

Or https://github.com/kubernetes-sigs/controller-runtime/releases/tag/v0.17.3

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sbueringer commented 1 week ago

@ahmetb ^^