Closed smg247 closed 1 month ago
Recently, we have noticed quite a few restarts in our prow-controller-manager instance due to a concurrent map write error. Excerpt from logs:
prow-controller-manager
{"component":"prow-controller-manager","file":"sigs.k8s.io/prow/pkg/kube/config.go:47","func":"sigs.k8s.io/prow/pkg/kube.kubeConfigs","level":"info","msg":"Parsed kubeconfig context: build11","severity":"info","time":"2024-10-03T19:02:00Z"} {"component":"prow-controller-manager","file":"sigs.k8s.io/prow/pkg/kube/config.go:127","func":"sigs.k8s.io/prow/pkg/kube.LoadClusterConfigs","level":"info","msg":"Loading kubeconfig from: \"/etc/build-farm-credentials/sa.prow-controller-manager.vsphere02.config\"","severity":"info","time":"2024-10-03T19:02:00Z"} {"component":"prow-controller-manager","file":"sigs.k8s.io/prow/pkg/kube/config.go:47","func":"sigs.k8s.io/prow/pkg/kube.kubeConfigs","level":"info","msg":"Parsed kubeconfig context: vsphere02","severity":"info","time":"2024-10-03T19:02:00Z"} fatal error: concurrent map writes goroutine 1379 [running]: sigs.k8s.io/controller-runtime/pkg/cache.defaultOpts(_, {0xc08dc48e70, 0xc000205c70, {0x2a81690, 0xc013c1aaf0}, 0x0, 0x0, 0xc08de9dce0, {0x0, 0x0}, ...}) sigs.k8s.io/controller-runtime@v0.17.6/pkg/cache/cache.go:488 +0xa65 sigs.k8s.io/controller-runtime/pkg/cache.New(0xc03d7dab48, {0xc08dc48e70, 0xc000205c70, {0x2a81690, 0xc013c1aaf0}, 0x0, 0x0, 0xc08de9dce0, {0x0, 0x0}, ...}) sigs.k8s.io/controller-runtime@v0.17.6/pkg/cache/cache.go:307 +0x78 sigs.k8s.io/controller-runtime/pkg/cluster.New(0xc03d7da908, {0xc08dc56d58, 0x1, 0x0?}) sigs.k8s.io/controller-runtime@v0.17.6/pkg/cluster/cluster.go:201 +0x322 sigs.k8s.io/prow/pkg/flagutil.(*KubernetesOptions).BuildClusters.func1({_, _}, {{0xc07158e7c0, 0x34}, {0x0, 0x0}, {{0x0, 0x0}, {0x0, 0x0}, ...}, ...}) sigs.k8s.io/prow/pkg/flagutil/kubernetes_cluster_clients.go:389 +0x1a9 created by sigs.k8s.io/prow/pkg/flagutil.(*KubernetesOptions).BuildClusters in goroutine 1 sigs.k8s.io/prow/pkg/flagutil/kubernetes_cluster_clients.go:383 +0x2ef goroutine 1 [semacquire]: sync.runtime_Semacquire(0xc08de9df50?) runtime/sema.go:62 +0x25 sync.(*WaitGroup).Wait(0xc07bd07558?) sync/waitgroup.go:116 +0x48 sigs.k8s.io/prow/pkg/flagutil.(*KubernetesOptions).BuildClusters(0xc0008b60e8, 0x0?, {0xc073ad7080, 0x6, 0x6}, 0x26b3410, {0xc00086bb9e, 0x2}, {0x0, 0x0, ...}) sigs.k8s.io/prow/pkg/flagutil/kubernetes_cluster_clients.go:421 +0x52a main.main() sigs.k8s.io/prow/cmd/prow-controller-manager/main.go:170 +0x926 goroutine 5 [select]: go.opencensus.io/stats/view.(*worker).start(0xc000072e00) go.opencensus.io@v0.24.0/stats/view/worker.go:292 +0x9f created by go.opencensus.io/stats/view.init.0 in goroutine 1 go.opencensus.io@v0.24.0/stats/view/worker.go:34 +0x8d goroutine 7 [chan receive]: knative.dev/pkg/metrics.(*metricsWorker).start(...) knative.dev/pkg@v0.0.0-20240416145024-0f34a8815650/metrics/metrics_worker.go:99 created by knative.dev/pkg/metrics.init.0 in goroutine 1 knative.dev/pkg@v0.0.0-20240416145024-0f34a8815650/metrics/exporter.go:39 +0xad goroutine 90 [chan receive]: sigs.k8s.io/prow/pkg/interrupts.handleInterrupt() sigs.k8s.io/prow/pkg/interrupts/interrupts.go:62 +0x85 created by sigs.k8s.io/prow/pkg/interrupts.init.0 in goroutine 1 sigs.k8s.io/prow/pkg/interrupts/interrupts.go:41 +0xb2
This is a clear race condition, and doesn't happen every time we restart the pod.
/kind bug
/assign
Recently, we have noticed quite a few restarts in our
prow-controller-manager
instance due to a concurrent map write error. Excerpt from logs:This is a clear race condition, and doesn't happen every time we restart the pod.