rabbitmq / cluster-operator

RabbitMQ Cluster Kubernetes Operator
https://www.rabbitmq.com/kubernetes/operator/operator-overview.html
Mozilla Public License 2.0
880 stars 272 forks source link

RabbitMQ memory budget isn't set properly in pod? #585

Closed nightkr closed 3 years ago

nightkr commented 3 years ago

Describe the bug

RabbitMQ seems to tune itself based on the host's total amount of memory, not the pod's requests or limits.

This means that the a cluster based on the same config will keep OOMKilling itself if there is a large resource disparity between hosts.

To Reproduce

Deploy the following manifest on single-node K8s clusters with both 16GiB (let's call it box 1) and 64GiB RAM (box 2):

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: my-test-cluster
spec:
  replicas: 3
  resources:
    requests:
      cpu: 100m
      memory: 250Mi
    limits:
      cpu: 1
      memory: 300Mi

Expected behavior

The Rabbit clusters will behave identically in the two environments.

Instead, the cluster on box 1 will work absolutely fine, while the cluster on box 2 will get stuck getting killed by the OOMKiller.

Increasing the memory limit to 4GiB (and deleting the pod, since the StatefulSet will be stuck trying to upgrade) ensures that it works on box 2 too.

Version and environment information

ChunyiLyu commented 3 years ago

@teozkr thanks for using the operator.

I cannot reproduce the reported issue. I took the manifest snippet you've provided and created two clusters in two different k8s which different total CPU and memory in their worker nodes.

In both cluster, total_memory_available_override_value is the same

rabbitmq@my-test-cluster-server-2:/$ rabbitmqctl environment | grep memory
      {memory_monitor_interval,2500},
      {total_memory_available_override_value,251658240},
      {vm_memory_calculation_strategy,rss},
      {vm_memory_high_watermark,0.4},
      {vm_memory_high_watermark_paging_ratio,0.5},

The exact memory used in my created clusters are not the same (no queue/msg, both empty). One as 0.0103 gb unused allocated memory, the other one has only 0.0078 gb left. Both clusters are at the verge of hitting the limit, so I am not surprised by different behaviors you observed. Plus pod evictions depends on available resources in each k8s cluster not just resource consumption of the given pods.

300Mi memory is not enough for anything. In both of my clusters without creating any queue or messages, clusters started with exceeding the high memory water mark, which blocked all connections that were publishing messages. Our default for memory limit/request is 2Gi and we recommend setting the same value for memory request and limit.

nightkr commented 3 years ago

@ChunyiLyu

Yeah, total_memory_available_override_value seems to be getting set correctly.

300Mi memory is not enough for anything. In both of my clusters without creating any queue or messages, clusters started with exceeding the high memory water mark, which blocked all connections that were publishing messages. Our default for memory limit/request is 2Gi and we recommend setting the same value for memory request and limit.

This would make more sense to me if the problem was at least consistent. Our QC environment (box 1) seems to work fine on 300Mi (although it has neglegible launch), while I had to increase my local environment (box 2)'s budget to 6Gi to avoid getting OOMKilled on startup.

ChunyiLyu commented 3 years ago

@teozkr Were you able to publish any message to your QC environment (box 1) with 300Mi memory? I would be surprised if you were able to do that, since both of my clusters hit memory water mark problem from cluster start and no message could be publish at all. I would say that your QC cluster never were working "fine" if you can't publish any message.

With 300Mi memory, the cluster is at the verge of using all allocated memory. In two of my clusters, the difference between used memory were around 2 mb (One as 0.0103 gb unused allocated memory, the other one has only 0.0078 gb left.). I think that's a reasonable difference between two clusters.

nightkr commented 3 years ago

@ChunyiLyu

Were you able to publish any message to your QC environment (box 1) with 300Mi memory?

Yes, I am able to publish messages to the QC cluster, and they are received fine. Curiously, they are all above the memory watermark (sitting around 99-105MiB, with a high watermark of 96MiB), so I agree that it arguably "shouldn't" work.