Closed thinko closed 5 years ago
I get a similar situation on GCE (Google Cloud Engine) (See my bundle below)
I'm super frustrated about this.
I get this problem even if I use nodes with 7GB RAM ( constraints: instance-type=n1-standard-2 root-disk=50G )
ubuntu@eriklonroth:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
elk google-controller google/europe-west1 2.4.7 unsupported 13:25:49Z
App Version Status Scale Charm Store Rev OS Notes
elasticsearch error 2 elasticsearch jujucharms 25 ubuntu
filebeat 5.6.13 active 1 filebeat jujucharms 19 ubuntu
kibana active 1 kibana jujucharms 19 ubuntu exposed
logstash active 1 logstash jujucharms 3 ubuntu
openjdk active 1 openjdk jujucharms 5 ubuntu
pyapp-snapped active 1 pyapp-snapped jujucharms 0 ubuntu
Unit Workload Agent Machine Public address Ports Message
elasticsearch/0* error idle 0 35.205.109.24 9200/tcp hook failed: "peer-relation-changed"
elasticsearch/1 error idle 1 35.187.4.206 9200/tcp hook failed: "peer-relation-changed"
kibana/0* active idle 2 35.195.139.44 80/tcp,9200/tcp ready
logstash/0* active idle 3 35.240.69.128 logstash installed
openjdk/0* active idle 35.240.69.128 OpenJDK 8 (jre) installed
pyapp-snapped/0* active idle 4 35.205.91.155 pyapp AVAILABLE
filebeat/0* active idle 35.205.91.155 Filebeat ready.
Machine State DNS Inst id Series AZ Message
0 started 35.205.109.24 juju-7bf07d-0 xenial europe-west1-b RUNNING
1 started 35.187.4.206 juju-7bf07d-1 xenial europe-west1-c RUNNING
2 started 35.195.139.44 juju-7bf07d-2 xenial europe-west1-d RUNNING
3 started 35.240.69.128 juju-7bf07d-3 xenial europe-west1-c RUNNING
4 started 35.205.91.155 juju-7bf07d-4 bionic europe-west1-b RUNNING
When I look into the node and manually test some things:
ubuntu@juju-7bf07d-0:~$ cd /var/lib/juju/agents/unit-elasticsearch-0/charm/
ubuntu@juju-7bf07d-0:/var/lib/juju/agents/unit-elasticsearch-0/charm$ sudo ansible-playbook -c local playbook.yaml --tags peer-relation-changed
PLAY ***************************************************************************
TASK [setup] *******************************************************************
ok: [localhost]
TASK [include] *****************************************************************
included: /var/lib/juju/agents/unit-elasticsearch-0/charm/tasks/install-elasticsearch.yml for localhost
TASK [include] *****************************************************************
included: /var/lib/juju/agents/unit-elasticsearch-0/charm/tasks/peer-relations.yml for localhost
TASK [Wait until the local service is available] *******************************
ok: [localhost]
TASK [Record current cluster health] *******************************************
ok: [localhost]
TASK [Restart if not part of cluster] ******************************************
changed: [localhost]
TASK [Wait until the local service is available after restart] *****************
ok: [localhost]
TASK [Pause to ensure that after restart unit has time to join.] ***************
Pausing for 30 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [localhost]
TASK [Record cluster health after restart] *************************************
ok: [localhost]
TASK [Fail if unit is still not part of cluster] *******************************
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "msg": "Unit failed to join cluster after peer-relation-changed"}
PLAY RECAP *********************************************************************
localhost : ok=9 changed=1 unreachable=0 failed=1
On one of the elasticsearch units:
ubuntu@juju-7bf07d-0:/var/lib/juju/agents/unit-elasticsearch-0/charm$ curl http://localhost:9200/_cluster/health
{"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}ubuntu@juju-7bf07d-0:/var/lib/juju/agents/unit-elasticsearch-0/charm$
ubuntu@eriklonroth:~$ cat elk.yaml
series: bionic
applications:
filebeat:
charm: 'cs:filebeat-19'
series: bionic
annotations:
gui-x: '716.5058288574219'
gui-y: '152.76995849609375'
pyapp-snapped:
charm: 'cs:~erik-lonroth/pyapp-snapped-0'
num_units: 1
series: bionic
annotations:
gui-x: '508.94989013671875'
gui-y: '121.77426147460938'
to:
- '4'
logstash:
charm: 'cs:logstash-3'
num_units: 1
constraints: mem=2048
series: xenial
annotations:
gui-x: '946.5189819335938'
gui-y: '524.4435424804688'
to:
- '3'
elasticsearch:
charm: 'cs:elasticsearch-25'
num_units: 2
series: xenial
annotations:
gui-x: '1197.71142578125'
gui-y: '528.3180236816406'
to:
- '0'
- '1'
kibana:
charm: 'cs:kibana-19'
num_units: 1
expose: true
series: xenial
annotations:
gui-x: '1461.3348388671875'
gui-y: '524.4436340332031'
to:
- '2'
openjdk:
charm: 'cs:openjdk-5'
series: xenial
annotations:
gui-x: '837.3892211914062'
gui-y: '757.6482543945312'
relations:
- - 'openjdk:java'
- 'logstash:java'
- - 'kibana:rest'
- 'elasticsearch:client'
- - 'logstash:elasticsearch'
- 'elasticsearch:client'
- - 'filebeat:beats-host'
- 'pyapp-snapped:juju-info'
- - 'filebeat:logstash'
- 'logstash:beat'
machines:
'0':
series: xenial
constraints: instance-type=n1-standard-2 root-disk=50G
'1':
series: xenial
constraints: instance-type=n1-standard-2 root-disk=50G
# constraints: arch=amd64 cpu-cores=2 cpu-power=200 mem=4096 root-disk=8192
'2':
series: xenial
constraints: arch=amd64 cpu-cores=2 cpu-power=200 mem=1024 root-disk=8192
'3':
series: xenial
constraints: arch=amd64 cpu-cores=2 cpu-power=200 mem=2048 root-disk=8192
'4':
series: bionic
constraints: arch=amd64 cpu-cores=2 cpu-power=200 mem=1024 root-disk=8192
Getting same issue with 8GB ram set on elastic nodes
I deployed to vsphere with the juju vsphere-controller, the default memory size of the created guest VMs is 1GB of RAM. Because of the inability of OpenJDK to allocate memory, the elasticsearch service will not start and ansible loops indefinitely waiting to reach elasticsearch.
unit machine process list:
13916 ? Ss 0:00 bash /var/lib/juju/init/jujud-machine-0/exec-start.sh
13923 ? Sl 0:00 \_ /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
14029 ? Sl 0:00 lxd-bridge-proxy --addr=[fe80::1%lxdbr0]:13128
14056 ? Ssl 0:08 /usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log
14094 ? Ss 0:00 bash /var/lib/juju/init/jujud-unit-elasticsearch-0/exec-start.sh
14099 ? Sl 0:00 \_ /var/lib/juju/tools/unit-elasticsearch-0/jujud unit --data-dir /var/lib/juju --unit-name elasticsearch/0 --debug
23599 ? S 0:00 \_ python3 /var/lib/juju/agents/unit-elasticsearch-0/charm/hooks/peer-relation-joined
23711 ? S 1:16 \_ /usr/bin/python /usr/bin/ansible-playbook -c local playbook.yaml --tags peer-relation-joined
23713 ? Sl 1:18 \_ /usr/bin/python /usr/bin/ansible-playbook -c local playbook.yaml --tags peer-relation-joined
23777 ? S 0:00 \_ /usr/bin/python /usr/bin/ansible-playbook -c local playbook.yaml --tags peer-relation-joined
23783 ? S 0:00 \_ /bin/sh -c LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 /usr/bin/python /.ansible/tmp/ansible-tmp-1520222173.08-237085152647854/wait_for; rm -rf "/.ansible/tmp/ansi
23784 ? S 0:00 \_ /usr/bin/python /.ansible/tmp/ansible-tmp-1520222173.08-237085152647854/wait_for
unit machine elasticsearch service status:
root@juju-773e04-0:~/j# systemctl status elasticsearch.service
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2018-03-04 21:09:46 MST; 5s ago
Docs: http://www.elastic.co
Process: 24062 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet -Edefault.path.logs=${LOG_DIR} -Edefault.path.data=${DATA_DIR} -Edefault.path.conf=${CONF_DIR} (code=exited, status=
Process: 24059 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCESS)
Main PID: 24062 (code=exited, status=1/FAILURE)
Mar 04 21:09:45 juju-773e04-0 systemd[1]: Started Elasticsearch.
Mar 04 21:09:46 juju-773e04-0 elasticsearch[24062]: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x000000070a660000, 3046768640, 0) failed; error='Cannot allocate memory' (errno=12)
Mar 04 21:09:46 juju-773e04-0 elasticsearch[24062]: #
Mar 04 21:09:46 juju-773e04-0 elasticsearch[24062]: # There is insufficient memory for the Java Runtime Environment to continue.
Mar 04 21:09:46 juju-773e04-0 elasticsearch[24062]: # Native memory allocation (mmap) failed to map 3046768640 bytes for committing reserved memory.
Mar 04 21:09:46 juju-773e04-0 elasticsearch[24062]: # An error report file with more information is saved as:
Mar 04 21:09:46 juju-773e04-0 elasticsearch[24062]: # /tmp/hs_err_pid24062.log
Mar 04 21:09:46 juju-773e04-0 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Mar 04 21:09:46 juju-773e04-0 systemd[1]: elasticsearch.service: Unit entered failed state.
Mar 04 21:09:46 juju-773e04-0 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
Resolution I edited the deployment
bundle.yaml
and bumped the deployment memory to 6GB for the elasticsearch nodes (added:constraints: "mem=6G"
to the elasticsearch service definition) and java is happy now.