Hitting `max virtual memory areas` error when starting small cluster

robcxyz commented 2 years ago

With the following cluster.yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opensearch-cluster
  namespace: opensearch
spec:
  general:
    serviceName: opensearch-cluster
    version: 1.3.1
  dashboards:
    enable: true
    version: 1.3.1
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
#      jvm: "-Xmx1024M -Xms1024M"
      diskSize: "10Gi"
      resources:
         requests:
            memory: "2Gi"
            cpu: 1
         limits:
            memory: "2Gi"
            cpu: 1
      roles:
        - "data"
        - "master"

(omitting nodeSelector, tolerations, persistence)

I am getting the following error.

[2022-08-08T05:35:03,534][INFO ][o.o.b.BootstrapChecks    ] [opensearch-cluster-masters-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
ERROR: OpenSearch did not exit normally - check the logs at /usr/share/opensearch/logs/opensearch-cluster.log
[2022-08-08T05:35:03,543][INFO ][o.o.n.Node               ] [opensearch-cluster-masters-0] stopping ...

Seen many other configurations with small resource limits like this one but no reported errors like this. Tried with and without the jvm option and with trying to increase the values to 2048 and increasing the requests / limits.

Running on k8s v1.23.5.

Happy to provide any additional information. Thank you for your work on this.

swoehrl-mw commented 2 years ago

Hi @robcxyz. Each opensearch pod is deployed with an init container called init-sysctl that sets vm.max_map_count to the higher value to avoid precisely this error message. Can you check the logs for this init container to see if there was a problem setting the option? Because it sounds like that did not happen correctly.

As a side note: The jvm option should not have an impact on this as the max_map_count is an option of the linux OS kernel while the jvm options only deal with the single opensearch process.

robcxyz commented 2 years ago

@swoehrl-mw - Thank you for your help.

Not getting any logs out of the init container which exited 0. This is the output of describing the pod.

Name:         opensearch-cluster-masters-0
Namespace:    opensearch
Priority:     0
Node:         es-cpu2-v1-b15eb099934e/10.7.96.15
Start Time:   Mon, 08 Aug 2022 12:16:02 -0600
Labels:       controller-revision-hash=opensearch-cluster-masters-777596c979
              opensearch.role=master
              opster.io/opensearch-cluster=opensearch-cluster
              opster.io/opensearch-nodepool=masters
              statefulset.kubernetes.io/pod-name=opensearch-cluster-masters-0
Annotations:  cni.projectcalico.org/podIP: 
              cni.projectcalico.org/podIPs: 
              opster.io/config: 
Status:       Running
IP:           10.244.207.204
IPs:
  IP:           10.244.207.204
Controlled By:  StatefulSet/opensearch-cluster-masters
Init Containers:
  init:
    Container ID:  containerd://4f5d86dbca7a699081fee94bf1c1c1610e87574e2cb6f6e8b793a74ee12c93ca
    Image:         public.ecr.aws/opsterio/busybox:latest
    Image ID:      public.ecr.aws/opsterio/busybox@sha256:de8fef98aa3842dc75f877384520156222f13bf1f0f86ad288b6e037aa816160
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
    Args:
      chown -R 1000:1000 /usr/share/opensearch/data
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 08 Aug 2022 12:16:08 -0600
      Finished:     Mon, 08 Aug 2022 12:16:08 -0600
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /usr/share/opensearch/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cxghg (ro)
Containers:
  opensearch:
    Container ID:   containerd://e3648f157c2d840a8d98cad5e45c5e42cda579df1547f5d77ae4dba01cd9ae12
    Image:          docker.io/opensearchproject/opensearch:1.3.1
    Image ID:       docker.io/opensearchproject/opensearch@sha256:53f826bb56a1e2a396b9f1ff6023e3f738422652581cec2ce33c05a6f07b9570
    Ports:          9200/TCP, 9300/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 08 Aug 2022 12:25:07 -0600
      Finished:     Mon, 08 Aug 2022 12:25:39 -0600
    Ready:          False
    Restart Count:  6
    Limits:
      cpu:     1
      memory:  3Gi
    Requests:
      cpu:      1
      memory:   3Gi
    Liveness:   tcp-socket :9200 delay=10s timeout=5s period=20s #success=1 #failure=10
    Readiness:  exec [/bin/bash -c curl -k -u "${OPENSEARCH_USER}:${OPENSEARCH_PASSWORD}" --silent --fail https://localhost:9200] delay=30s timeout=1s period=30s #success=1 #failure=3
    Startup:    tcp-socket :9200 delay=10s timeout=5s period=20s #success=1 #failure=10
    Environment:
      cluster.initial_master_nodes:  opensearch-cluster-bootstrap-0
      discovery.seed_hosts:          opensearch-cluster-discovery
      cluster.name:                  opensearch-cluster
      network.bind_host:             0.0.0.0
      network.publish_host:          opensearch-cluster-masters-0 (v1:metadata.name)
      OPENSEARCH_JAVA_OPTS:          -Xmx2048M -Xms2048M -Dopensearch.transport.cname_in_publish_address=true
      node.roles:                    data,master
      http.port:                     9200
      OPENSEARCH_USER:               admin
      OPENSEARCH_PASSWORD:           <set to the key 'password' in secret 'opensearch-cluster-admin-password'>  Optional: false
    Mounts:
      /usr/share/opensearch/data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cxghg (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-opensearch-cluster-masters-0
    ReadOnly:   false
  kube-api-access-cxghg:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              vke.vultr.com/node-pool=es-cpu2-v1
Tolerations:                 app=opensearch:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  9m53s                   default-scheduler  Successfully assigned opensearch/opensearch-cluster-masters-0 to es-cpu2-v1-b15eb099934e
  Normal   Pulling    9m48s                   kubelet            Pulling image "public.ecr.aws/opsterio/busybox:latest"
  Normal   Pulled     9m47s                   kubelet            Successfully pulled image "public.ecr.aws/opsterio/busybox:latest" in 562.800485ms
  Normal   Created    9m47s                   kubelet            Created container init
  Normal   Started    9m47s                   kubelet            Started container init
  Normal   Created    8m19s (x3 over 9m47s)   kubelet            Created container opensearch
  Normal   Started    8m19s (x3 over 9m47s)   kubelet            Started container opensearch
  Warning  Unhealthy  7m53s (x4 over 9m33s)   kubelet            Startup probe failed: dial tcp 10.244.207.204:9200: connect: connection refused
  Normal   Pulled     7m20s (x4 over 9m47s)   kubelet            Container image "docker.io/opensearchproject/opensearch:1.3.1" already present on machine
  Warning  BackOff    4m45s (x18 over 8m37s)  kubelet            Back-off restarting failed container

Is there supposed to be more inside the init containers command other than sh -c? Please let me know if there is anything else I can provide to debug. Also I am doing this on very small nodes, 2 cpu / 4gb ram. Could these be too small?

swoehrl-mw commented 2 years ago

Hi @robcxyz . I forgot that the init container I mentioned is not run by default. Can you please add the following to your cluster spec and try again?:

...
spec:
  general:
    setVMMaxMapCount: true

This should add an extra init container to each pod that sets the vm.max_map_count.

Is there supposed to be more inside the init containers command other than sh -c?

That is all correct. There are two separate init containers: The one you have seen which takes care of the volume permissions (in the args field you can see a chown command, so in the end the container runs something like sh -c chown ...). And the one to use sysctl which should run when you set the option I mentioned above.

robcxyz commented 2 years ago

Thanks @swoehrl-mw - While the cluster isn't deploying still, this issue has been resolved. Will file separate issues for what I am facing now.

Was planning on submitting a PR anyways after I successfully deployed the cluster, should this be included in the docs?

dbason commented 2 years ago

@swoehrl-mw do you think we should invert the behaviour to run the vm.max_map_count initContainer by default, and allow users to explicitly disable it?

swoehrl-mw commented 2 years ago

Was planning on submitting a PR anyways after I successfully deployed the cluster, should this be included in the docs?

Yes, good idea, this should be part of the userguide. If you feel up to it, a PR is very welcome.

@swoehrl-mw do you think we should invert the behaviour to run the vm.max_map_count initContainer by default, and allow users to explicitly disable it?

@dbason I think the current behaviour is ok, it should just be made clear in the docs. What we could think about: Make the default (enabled/disabled) configurable for the operator deployment itself in the helm values. I'm thinking about scenarios where a cluster-admin deploys the operator and then teams/users use it to deploy actual opensearch clusters. And in that case the cluster-admin should be able to assess if the vm.max_map_count is needed for that specific cluster, but the users probably do not know.

swoehrl-mw commented 1 year ago

Problem was adresssed. Nothing to fix. Closing as completed.

piellick commented 1 year ago

hi, looks still not implemented in any documentation right ?

swoehrl-mw commented 1 year ago

Hi @piellick.

hi, looks still not implemented in any documentation right ?

Yes, correct, just checked, this is not part of the userguide, must have slipped through the cracks. I've opened #474 so we can track it.

piellick commented 1 year ago

great, thanks !

opensearch-project / opensearch-k8s-operator

Hitting `max virtual memory areas` error when starting small cluster #241