hazelcast / management-center-docker

This repository contains Docker image for Hazelcast Management Center.
13 stars 21 forks source link

HMC 3.12.12 keeps dying with OOM Java Heap Space after some time when Hazelcast works with large number of topics #44

Open spliakos opened 4 years ago

spliakos commented 4 years ago

kctl describe pod infra-hmc-57f77bd495-czr54 Name: infra-hmc-57f77bd495-czr54 Namespace: default Priority: 0 Node:
Start Time: Wed, 14 Oct 2020 09:21:35 +0200 Labels: appName=hmc pod-template-hash=57f77bd495 version=3.12.12 Annotations: kubectl.kubernetes.io/restartedAt: 2020-10-12T16:00:29+02:00 kubernetes.io/psp: restricted seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running IP:
IPs: IP:
Controlled By: ReplicaSet/infra-hmc-57f77bd495 Containers: hmc: Container ID: docker://0a3a41bde2236a357c305485fbdf30811d395917b388b8fd2235c2171574cef7 Image: hazelcast/management-center:3.12.12 Image ID: docker-pullable://hazelcast/management-center@sha256:bebce8775ec86718a7a4adef330254b63fd8c94d3becbeca34038b9b17341712 Ports: 8080/TCP, 8081/TCP Host Ports: 0/TCP, 0/TCP State: Running Started: Wed, 14 Oct 2020 13:31:52 +0200 Last State: Terminated Reason: Error Exit Code: 3 Started: Wed, 14 Oct 2020 13:14:17 +0200 Finished: Wed, 14 Oct 2020 13:31:50 +0200 Ready: True Restart Count: 14 Requests: memory: 4Gi Environment: JAVA_OPTS: -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError MC_ADMIN_USER:
MC_ADMIN_PASSWORD:
CONTAINER_SUPPORT: false MIN_HEAP_SIZE: 1024m MAX_HEAP_SIZE: 4096m Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-4glt8 (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: default-token-4glt8: Type: Secret (a volume populated by a Secret) SecretName: default-token-4glt8 Optional: false QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Pulling 15m (x15 over 4h25m) kubelet Pulling image "hazelcast/management-center:3.12.12" Normal Pulled 15m (x15 over 4h25m) kubelet Successfully pulled image "hazelcast/management-center:3.12.12" Normal Created 15m (x15 over 4h25m) kubelet Created container hmc Normal Started 15m (x15 over 4h25m) kubelet Started container hmc

Logs: kctl logs -f infra-hmc-57f77bd495-czr54 ########################################

JAVA_OPTS=-Dhazelcast.mancenter.home=/data -Djava.net.preferIPv4Stack=true -Dhazelcast.mc.healthCheck.enable=true -Dhazelcast.mc.allowMultipleLogin=true -XX:+ExitOnOutOfMemoryError -Xms1024m -Xmx4096m

MC_CLASSPATH=/opt/hazelcast/mancenter/hazelcast-mancenter-3.12.12.war

starting now....

########################################

3 Hazelcast nodes, 8 maps, ~85000 topics In other environments when we have smaller number of topics, HMC seems to be working and not crushing. But when number of topics reaches 10k+ then we have the same situation over and over.

erosb commented 4 years ago

Hello,

I suggest adjusting the value of the hazelcast.mc.cache.max.size system property to a value lower than the default 768. It limits the number of timestamped cluster states stored in-memory. I can't advise about the exact setting, because it is a matter of cluster size & also we never stress-tested it for a high number of topics, but we have some reference data for a lot of maps as a starting point. Topic states are expected to take much less space than map stats though.

emre-aydin commented 4 years ago

@spliakos it might make sense to disable statistics for some of your topics to not flood Management Center with all their metrics. Note that you can also use regular expressions to apply the same config to more than one topic, or even change the default config but applying specialized config to the ones you like.

spliakos commented 4 years ago

Hey @erosb, I thought about changing the cache size, but according to HC.. "It is not recommended to change the cache size unless the cluster has a large number of maps which may cause Management Center to run out of heap memory. Setting too low a value for hazelcast.mc.cache.max.size can be detrimental to the level of detail shown within Management Center, especially when it comes to graphs." We only have 8 maps, but a lot of topics. I will try it however and see how it goes.

@emre-aydin: This actually makes sense, we will try this and update :)