wave-k8s / wave

Kubernetes configuration tracking controller
Apache License 2.0
646 stars 82 forks source link

Add lock to watcher hash map to prevent concurrent access panics #161

Closed jabdoa2 closed 1 month ago

jabdoa2 commented 1 month ago

I rolled out Wave 0.7 to a cluster with about 1k deployments. Migation went smooth. However, it crashed due to a concurrent access to the new watcher hashmap. Apparently, watchers are separate goroutines and run concurrently:

fatal error: concurrent map read and map write

goroutine 199 [running]:
github.com/wave-k8s/wave/pkg/core.(*enqueueRequestForWatcher).queueOwnerReconcileRequest(0xc00049c3e8, {0x1b49930?, 0xc00f2c5b80?}, {0x1b3ff10, 0xc000498160})

I fixed that by adding mutexes. Obviously, access to this hashmap can be optimized further.

However, even with this crash it performs well. Restarts are fast enough and the other controller will take over within a few seconds. Even though it crashes occasionally the overall CPU load is lower than before:

grafik

Same appears to be the case for memory usage:

grafik

Upgrade happened around 21:00 which explains the small CPU spike during that time.