Closed sboschman closed 4 years ago
FWIW I'm seeing the same thing on a 3.11 cluster.
Seems to be in the metrics part:
Just after start:
Showing nodes accounting for 31415.49kB, 100% of 31415.49kB total
flat flat% sum% cum cum%
10754.25kB 34.23% 34.23% 12290.33kB 39.12% encoding/json.(*decodeState).objectInterface
4620.32kB 14.71% 48.94% 4620.32kB 14.71% k8s.io/apimachinery/pkg/api/meta.(*DefaultRESTMapper).AddSpecific
2640.83kB 8.41% 57.35% 2640.83kB 8.41% k8s.io/apimachinery/pkg/watch.(*Broadcaster).Watch.func1
2590.11kB 8.24% 65.59% 2590.11kB 8.24% github.com/beorn7/perks/quantile.newStream
1536.08kB 4.89% 70.48% 1536.08kB 4.89% encoding/json.unquote
1030.01kB 3.28% 73.76% 7186.39kB 22.88% k8s.io/client-go/restmapper.NewDiscoveryRESTMapper
A few minutes later:
Showing nodes accounting for 188.79MB, 96.66% of 195.32MB total
Dropped 75 nodes (cum <= 0.98MB)
flat flat% sum% cum cum%
116.85MB 59.82% 59.82% 116.85MB 59.82% github.com/beorn7/perks/quantile.newStream
27.34MB 14.00% 73.82% 27.34MB 14.00% k8s.io/apimachinery/pkg/watch.(*Broadcaster).Watch.func1
13MB 6.66% 80.48% 15.50MB 7.94% encoding/json.(*decodeState).objectInterface
12.55MB 6.42% 86.90% 129.90MB 66.50% github.com/prometheus/client_golang/prometheus.newSummary
8.53MB 4.37% 91.27% 8.53MB 4.37% k8s.io/apimachinery/pkg/api/meta.(*DefaultRESTMapper).AddSpecific
Upgraded the dependencies to Operator SDK 0.15 pinning k8s to 1.16.2. After fixen a few compile errors I was able to start the operator locally:
{"level":"info","ts":1579777403.3433406,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1579777403.9105785,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1579777406.6541378,"logger":"controller_namespaceconfig","msg":"adding managed object for","object": ... }
{"level":"info","ts":1579777407.2253845,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1579777407.225979,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1579777407.2702403,"logger":"controller_namespaceconfig","msg":"adding managed object for","object":
{"level":"info","ts":1579777407.8414943,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"error","ts":1579777407.8416977,"logger":"controller-runtime.metrics","msg":"metrics server failed to listen. You may want to disable the metrics server or use another port if it is due to conflicts","error":"error listening on :8080: listen tcp :8080: bind: address already in use" ...}
After disabling the creation of a manager for each resource (https://github.com/redhat-cop/namespace-configuration-operator/blob/master/pkg/controller/namespaceconfig/namespaceconfig_controller.go#L518, introduced with #11 ) the memory usage seems stable:
Showing nodes accounting for 2820.76kB, 100% of 2820.76kB total
flat flat% sum% cum cum%
768.26kB 27.24% 27.24% 768.26kB 27.24% go.uber.org/zap/zapcore.newCounters
516.01kB 18.29% 45.53% 516.01kB 18.29% sync.(*Map).dirtyLocked
512.28kB 18.16% 63.69% 512.28kB 18.16% k8s.io/apimachinery/pkg/api/meta.(*DefaultRESTMapper).AddSpecific
512.19kB 18.16% 81.85% 512.19kB 18.16% runtime.malg
512.02kB 18.15% 100% 1024.21kB 36.31% runtime.systemstack
Seems the namespace-configuration-operator keeps using more and more memory. After a node suffered some oom issues the log showed a namespace-configuration-operation process being oom-killed by the kernel, releasing 150gb of memory.
As the chart currently does not support setting resource limits/requests for the operator process, I used a k8s
LimitRange
resource to limit the operator to 2gb memory. The following graph shows the operator using more memory as time passes by, until it reaches 2gb and it gets terminated.The cpu resource limit is 2 and this is what docker stats is reporting (it is burning cpu cycles: 204.75%):