Closed robcxyz closed 1 year ago
Hi @robcxyz. Each opensearch pod is deployed with an init container called init-sysctl
that sets vm.max_map_count
to the higher value to avoid precisely this error message. Can you check the logs for this init container to see if there was a problem setting the option? Because it sounds like that did not happen correctly.
As a side note: The jvm
option should not have an impact on this as the max_map_count
is an option of the linux OS kernel while the jvm
options only deal with the single opensearch process.
@swoehrl-mw - Thank you for your help.
Not getting any logs out of the init container which exited 0. This is the output of describing the pod.
Name: opensearch-cluster-masters-0
Namespace: opensearch
Priority: 0
Node: es-cpu2-v1-b15eb099934e/10.7.96.15
Start Time: Mon, 08 Aug 2022 12:16:02 -0600
Labels: controller-revision-hash=opensearch-cluster-masters-777596c979
opensearch.role=master
opster.io/opensearch-cluster=opensearch-cluster
opster.io/opensearch-nodepool=masters
statefulset.kubernetes.io/pod-name=opensearch-cluster-masters-0
Annotations: cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
opster.io/config:
Status: Running
IP: 10.244.207.204
IPs:
IP: 10.244.207.204
Controlled By: StatefulSet/opensearch-cluster-masters
Init Containers:
init:
Container ID: containerd://4f5d86dbca7a699081fee94bf1c1c1610e87574e2cb6f6e8b793a74ee12c93ca
Image: public.ecr.aws/opsterio/busybox:latest
Image ID: public.ecr.aws/opsterio/busybox@sha256:de8fef98aa3842dc75f877384520156222f13bf1f0f86ad288b6e037aa816160
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
chown -R 1000:1000 /usr/share/opensearch/data
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 08 Aug 2022 12:16:08 -0600
Finished: Mon, 08 Aug 2022 12:16:08 -0600
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/usr/share/opensearch/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cxghg (ro)
Containers:
opensearch:
Container ID: containerd://e3648f157c2d840a8d98cad5e45c5e42cda579df1547f5d77ae4dba01cd9ae12
Image: docker.io/opensearchproject/opensearch:1.3.1
Image ID: docker.io/opensearchproject/opensearch@sha256:53f826bb56a1e2a396b9f1ff6023e3f738422652581cec2ce33c05a6f07b9570
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 08 Aug 2022 12:25:07 -0600
Finished: Mon, 08 Aug 2022 12:25:39 -0600
Ready: False
Restart Count: 6
Limits:
cpu: 1
memory: 3Gi
Requests:
cpu: 1
memory: 3Gi
Liveness: tcp-socket :9200 delay=10s timeout=5s period=20s #success=1 #failure=10
Readiness: exec [/bin/bash -c curl -k -u "${OPENSEARCH_USER}:${OPENSEARCH_PASSWORD}" --silent --fail https://localhost:9200] delay=30s timeout=1s period=30s #success=1 #failure=3
Startup: tcp-socket :9200 delay=10s timeout=5s period=20s #success=1 #failure=10
Environment:
cluster.initial_master_nodes: opensearch-cluster-bootstrap-0
discovery.seed_hosts: opensearch-cluster-discovery
cluster.name: opensearch-cluster
network.bind_host: 0.0.0.0
network.publish_host: opensearch-cluster-masters-0 (v1:metadata.name)
OPENSEARCH_JAVA_OPTS: -Xmx2048M -Xms2048M -Dopensearch.transport.cname_in_publish_address=true
node.roles: data,master
http.port: 9200
OPENSEARCH_USER: admin
OPENSEARCH_PASSWORD: <set to the key 'password' in secret 'opensearch-cluster-admin-password'> Optional: false
Mounts:
/usr/share/opensearch/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cxghg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-opensearch-cluster-masters-0
ReadOnly: false
kube-api-access-cxghg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: vke.vultr.com/node-pool=es-cpu2-v1
Tolerations: app=opensearch:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m53s default-scheduler Successfully assigned opensearch/opensearch-cluster-masters-0 to es-cpu2-v1-b15eb099934e
Normal Pulling 9m48s kubelet Pulling image "public.ecr.aws/opsterio/busybox:latest"
Normal Pulled 9m47s kubelet Successfully pulled image "public.ecr.aws/opsterio/busybox:latest" in 562.800485ms
Normal Created 9m47s kubelet Created container init
Normal Started 9m47s kubelet Started container init
Normal Created 8m19s (x3 over 9m47s) kubelet Created container opensearch
Normal Started 8m19s (x3 over 9m47s) kubelet Started container opensearch
Warning Unhealthy 7m53s (x4 over 9m33s) kubelet Startup probe failed: dial tcp 10.244.207.204:9200: connect: connection refused
Normal Pulled 7m20s (x4 over 9m47s) kubelet Container image "docker.io/opensearchproject/opensearch:1.3.1" already present on machine
Warning BackOff 4m45s (x18 over 8m37s) kubelet Back-off restarting failed container
Is there supposed to be more inside the init containers command other than sh -c
? Please let me know if there is anything else I can provide to debug. Also I am doing this on very small nodes, 2 cpu / 4gb ram. Could these be too small?
Hi @robcxyz . I forgot that the init container I mentioned is not run by default. Can you please add the following to your cluster spec and try again?:
...
spec:
general:
setVMMaxMapCount: true
This should add an extra init container to each pod that sets the vm.max_map_count
.
Is there supposed to be more inside the init containers command other than sh -c?
That is all correct. There are two separate init containers: The one you have seen which takes care of the volume permissions (in the args
field you can see a chown
command, so in the end the container runs something like sh -c chown ...
). And the one to use sysctl which should run when you set the option I mentioned above.
Thanks @swoehrl-mw - While the cluster isn't deploying still, this issue has been resolved. Will file separate issues for what I am facing now.
Was planning on submitting a PR anyways after I successfully deployed the cluster, should this be included in the docs?
@swoehrl-mw do you think we should invert the behaviour to run the vm.max_map_count
initContainer by default, and allow users to explicitly disable it?
Was planning on submitting a PR anyways after I successfully deployed the cluster, should this be included in the docs?
Yes, good idea, this should be part of the userguide. If you feel up to it, a PR is very welcome.
@swoehrl-mw do you think we should invert the behaviour to run the vm.max_map_count initContainer by default, and allow users to explicitly disable it?
@dbason I think the current behaviour is ok, it should just be made clear in the docs. What we could think about: Make the default (enabled/disabled) configurable for the operator deployment itself in the helm values. I'm thinking about scenarios where a cluster-admin deploys the operator and then teams/users use it to deploy actual opensearch clusters. And in that case the cluster-admin should be able to assess if the vm.max_map_count
is needed for that specific cluster, but the users probably do not know.
Problem was adresssed. Nothing to fix. Closing as completed.
hi, looks still not implemented in any documentation right ?
Hi @piellick.
hi, looks still not implemented in any documentation right ?
Yes, correct, just checked, this is not part of the userguide, must have slipped through the cracks. I've opened #474 so we can track it.
great, thanks !
With the following
cluster.yaml
(omitting nodeSelector, tolerations, persistence)
I am getting the following error.
Seen many other configurations with small resource limits like this one but no reported errors like this. Tried with and without the
jvm
option and with trying to increase the values to 2048 and increasing the requests / limits.Running on k8s v1.23.5.
Happy to provide any additional information. Thank you for your work on this.