Closed developius closed 7 years ago
Sorry for the delay on this. I'll first make sure it still works for me as expected -- perhaps I broke something Swarm-related along the way.
@developius ...it took a few minutes, but mine did startup properly. So, let's compare notes. My Docker version is
) docker version
Client:
Version: 17.05.0-ce
API version: 1.29
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:15:36 2017
OS/Arch: linux/amd64
Server:
Version: 17.05.0-ce
API version: 1.29 (minimum version 1.12)
Go version: go1.7.5
Git commit: 89658be
Built: Thu May 4 22:15:36 2017
OS/Arch: linux/amd64
Experimental: false
on Ubuntu 17.04 with kernel version 4.10.0-24-generic
Otherwise, of interest for comparison would be the container logs from the master instance:
Finding IPs. found! 10.0.0.3,172.25.0.3
Starting Elasticsearch with the options -E path.conf=/conf -E path.data=/data -E path.logs=/data -E transport.tcp.port=9300 -E http.port=9200 -E network.host=10.0.0.3,172.25.0.3 -E node.master=true -E node.data=false -E node.ingest=false -E discovery.zen.ping.unicast.hosts=master -E discovery.zen.minimum_master_nodes=1
Running as non-root...
[2017-06-29T03:57:09,128][INFO ][o.e.n.Node ] [] initializing ...
[2017-06-29T03:57:09,926][INFO ][o.e.e.NodeEnvironment ] [vQkhQ0w] using [1] data paths, mounts [[/data (/dev/mapper/ubuntu--vg-root)]], net usable_space [137.5gb], net total_space [226gb], spins? [possibly], types [ext4]
[2017-06-29T03:57:09,926][INFO ][o.e.e.NodeEnvironment ] [vQkhQ0w] heap size [981.5mb], compressed ordinary object pointers [true]
[2017-06-29T03:57:09,932][INFO ][o.e.n.Node ] node name [vQkhQ0w] derived from node ID [vQkhQ0waReqTczLIpq7pwA]; set [node.name] to override
[2017-06-29T03:57:09,933][INFO ][o.e.n.Node ] version[5.4.2], pid[22], build[929b078/2017-06-15T02:29:28.122Z], OS[Linux/4.10.0-24-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-06-29T03:57:09,933][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -Des.path.home=/usr/share/elasticsearch-5.4.2]
[2017-06-29T03:57:25,600][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [aggs-matrix-stats]
[2017-06-29T03:57:25,601][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [ingest-common]
[2017-06-29T03:57:25,601][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [lang-expression]
[2017-06-29T03:57:25,602][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [lang-groovy]
[2017-06-29T03:57:25,602][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [lang-mustache]
[2017-06-29T03:57:25,603][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [lang-painless]
[2017-06-29T03:57:25,603][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [percolator]
[2017-06-29T03:57:25,603][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [reindex]
[2017-06-29T03:57:25,604][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [transport-netty3]
[2017-06-29T03:57:25,604][INFO ][o.e.p.PluginsService ] [vQkhQ0w] loaded module [transport-netty4]
[2017-06-29T03:57:25,607][INFO ][o.e.p.PluginsService ] [vQkhQ0w] no plugins loaded
[2017-06-29T03:57:55,981][INFO ][o.e.d.DiscoveryModule ] [vQkhQ0w] using discovery type [zen]
[2017-06-29T03:57:59,489][INFO ][o.e.n.Node ] initialized
[2017-06-29T03:57:59,489][INFO ][o.e.n.Node ] [vQkhQ0w] starting ...
[2017-06-29T03:57:59,759][INFO ][i.n.u.i.PlatformDependent] Your platform does not provide complete low-level API for accessing direct buffers reliably. Unless explicitly requested, heap buffer will always be preferred to avoid potential system instability.
[2017-06-29T03:58:00,211][INFO ][o.e.t.TransportService ] [vQkhQ0w] publish_address {10.0.0.3:9300}, bound_addresses {10.0.0.3:9300}, {172.25.0.3:9300}
[2017-06-29T03:58:00,242][INFO ][o.e.b.BootstrapChecks ] [vQkhQ0w] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-29T03:58:05,441][WARN ][o.e.d.z.UnicastZenPing ] [vQkhQ0w] timed out after [5s] resolving host [master]
[2017-06-29T03:58:08,479][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] new_master {vQkhQ0w}{vQkhQ0waReqTczLIpq7pwA}{YNkTlQTJQRG-qaJMZGaqsQ}{10.0.0.3}{10.0.0.3:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-06-29T03:58:08,510][INFO ][o.e.h.n.Netty4HttpServerTransport] [vQkhQ0w] publish_address {10.0.0.3:9200}, bound_addresses {10.0.0.3:9200}, {172.25.0.3:9200}
[2017-06-29T03:58:08,515][INFO ][o.e.n.Node ] [vQkhQ0w] started
[2017-06-29T03:58:08,578][INFO ][o.e.g.GatewayService ] [vQkhQ0w] recovered [0] indices into cluster_state
[2017-06-29T03:59:36,068][WARN ][o.e.m.j.JvmGcMonitorService] [vQkhQ0w] [gc][young][78][3] duration [11.6s], collections [1]/[11.7s], total [11.6s]/[17.7s], memory [210mb]->[205.3mb]/[981.5mb], all_pools {[young] [184.1mb]->[17.7mb]/[256mb]}{[survivor] [0b]->[42.4mb]/[42.5mb]}{[old] [25.9mb]->[146mb]/[683mb]}
[2017-06-29T03:59:36,070][WARN ][o.e.m.j.JvmGcMonitorService] [vQkhQ0w] [gc][78] overhead, spent [11.6s] collecting in the last [11.7s]
[2017-06-29T03:59:36,833][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] added {{fckZqQH}{fckZqQHqRL6vj8S5mAjiNQ}{F2NgTOUdQYmMjqATQNHYYg}{10.0.0.5}{10.0.0.5:9300},}, reason: zen-disco-node-join[{fckZqQH}{fckZqQHqRL6vj8S5mAjiNQ}{F2NgTOUdQYmMjqATQNHYYg}{10.0.0.5}{10.0.0.5:9300}]
[2017-06-29T03:59:49,349][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] added {{lXP_bfK}{lXP_bfK8SE22FmgR88ylXQ}{ROycw9TQSJG-jyiledCXew}{10.0.0.6}{10.0.0.6:9300},}, reason: zen-disco-node-join[{lXP_bfK}{lXP_bfK8SE22FmgR88ylXQ}{ROycw9TQSJG-jyiledCXew}{10.0.0.6}{10.0.0.6:9300}]
[2017-06-29T04:00:00,538][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] added {{Pv0920n}{Pv0920npSyyCQncjalSK8w}{IbxA-dtsQkubJAE9nI4OAg}{10.0.0.10}{10.0.0.10:9300},{k9JC0S4}{k9JC0S4CTVq-h90N5PACTg}{BP2OQr1KTK2acUSMIIV7bQ}{10.0.0.8}{10.0.0.8:9300},}, reason: zen-disco-node-join[{k9JC0S4}{k9JC0S4CTVq-h90N5PACTg}{BP2OQr1KTK2acUSMIIV7bQ}{10.0.0.8}{10.0.0.8:9300}, {Pv0920n}{Pv0920npSyyCQncjalSK8w}{IbxA-dtsQkubJAE9nI4OAg}{10.0.0.10}{10.0.0.10:9300}]
[2017-06-29T04:00:36,654][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] removed {{Pv0920n}{Pv0920npSyyCQncjalSK8w}{IbxA-dtsQkubJAE9nI4OAg}{10.0.0.10}{10.0.0.10:9300},{fckZqQH}{fckZqQHqRL6vj8S5mAjiNQ}{F2NgTOUdQYmMjqATQNHYYg}{10.0.0.5}{10.0.0.5:9300},{lXP_bfK}{lXP_bfK8SE22FmgR88ylXQ}{ROycw9TQSJG-jyiledCXew}{10.0.0.6}{10.0.0.6:9300},{k9JC0S4}{k9JC0S4CTVq-h90N5PACTg}{BP2OQr1KTK2acUSMIIV7bQ}{10.0.0.8}{10.0.0.8:9300},}, reason: zen-disco-node-failed({lXP_bfK}{lXP_bfK8SE22FmgR88ylXQ}{ROycw9TQSJG-jyiledCXew}{10.0.0.6}{10.0.0.6:9300}), reason(transport disconnected)[{lXP_bfK}{lXP_bfK8SE22FmgR88ylXQ}{ROycw9TQSJG-jyiledCXew}{10.0.0.6}{10.0.0.6:9300} transport disconnected], zen-disco-node-failed({Pv0920n}{Pv0920npSyyCQncjalSK8w}{IbxA-dtsQkubJAE9nI4OAg}{10.0.0.10}{10.0.0.10:9300}), reason(transport disconnected)[{Pv0920n}{Pv0920npSyyCQncjalSK8w}{IbxA-dtsQkubJAE9nI4OAg}{10.0.0.10}{10.0.0.10:9300} transport disconnected], zen-disco-node-failed({k9JC0S4}{k9JC0S4CTVq-h90N5PACTg}{BP2OQr1KTK2acUSMIIV7bQ}{10.0.0.8}{10.0.0.8:9300}), reason(transport disconnected)[{k9JC0S4}{k9JC0S4CTVq-h90N5PACTg}{BP2OQr1KTK2acUSMIIV7bQ}{10.0.0.8}{10.0.0.8:9300} transport disconnected], zen-disco-node-failed({fckZqQH}{fckZqQHqRL6vj8S5mAjiNQ}{F2NgTOUdQYmMjqATQNHYYg}{10.0.0.5}{10.0.0.5:9300}), reason(transport disconnected)[{fckZqQH}{fckZqQHqRL6vj8S5mAjiNQ}{F2NgTOUdQYmMjqATQNHYYg}{10.0.0.5}{10.0.0.5:9300} transport disconnected]
[2017-06-29T04:01:26,129][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] added {{L5__TRc}{L5__TRcPTQyK0ZlCIjB3Rg}{hHCYeLOURViYdQlRX8FwYw}{10.0.0.6}{10.0.0.6:9300},}, reason: zen-disco-node-join[{L5__TRc}{L5__TRcPTQyK0ZlCIjB3Rg}{hHCYeLOURViYdQlRX8FwYw}{10.0.0.6}{10.0.0.6:9300}]
[2017-06-29T04:01:27,422][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] added {{6pZRj4J}{6pZRj4JIS_OtQWY4j4CWuA}{N5VsjXqHS7-kkvONKXSwOg}{10.0.0.8}{10.0.0.8:9300},}, reason: zen-disco-node-join[{6pZRj4J}{6pZRj4JIS_OtQWY4j4CWuA}{N5VsjXqHS7-kkvONKXSwOg}{10.0.0.8}{10.0.0.8:9300}]
[2017-06-29T04:01:34,245][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] added {{jdNUbcq}{jdNUbcqZS-elAC8sDSjKmw}{vXsUJ8zNQrSvPCEXwf7JNA}{10.0.0.5}{10.0.0.5:9300},}, reason: zen-disco-node-join[{jdNUbcq}{jdNUbcqZS-elAC8sDSjKmw}{vXsUJ8zNQrSvPCEXwf7JNA}{10.0.0.5}{10.0.0.5:9300}]
[2017-06-29T04:01:43,623][INFO ][o.e.c.s.ClusterService ] [vQkhQ0w] added {{LknhmNI}{LknhmNIESxysd8wjHt9VWg}{YZyGgQH2QxO9uDB4eR2U1w}{10.0.0.10}{10.0.0.10:9300},}, reason: zen-disco-node-join[{LknhmNI}{LknhmNIESxysd8wjHt9VWg}{YZyGgQH2QxO9uDB4eR2U1w}{10.0.0.10}{10.0.0.10:9300}]
[2017-06-29T04:01:59,413][INFO ][o.e.c.m.MetaDataCreateIndexService] [vQkhQ0w] [.kibana] creating index, cause [api], templates [], shards [1]/[1], mappings [server, config]
[2017-06-29T04:02:01,023][INFO ][o.e.c.r.a.AllocationService] [vQkhQ0w] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.kibana][0]] ...]).
...I just noticed I did have a few false starts of task containers
docker stack ps es
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
0lfcam10uwyw es_gateway.1 itzg/elasticsearch:latest zenbook Running Running 7 minutes ago
maddv824lpyz es_ingest.1 itzg/elasticsearch:latest zenbook Running Running 7 minutes ago
s8qridgtxuko es_data.1 itzg/elasticsearch:latest zenbook Running Running 7 minutes ago
iatuj8c8j2mv es_ingest.1 itzg/elasticsearch:latest zenbook Shutdown Failed 8 minutes ago "task: non-zero exit (137): do…"
9m3smgb01mv7 es_gateway.1 itzg/elasticsearch:latest zenbook Shutdown Failed 8 minutes ago "task: non-zero exit (137): do…"
9rh7u3rwan11 es_data.1 itzg/elasticsearch:latest zenbook Shutdown Failed 8 minutes ago "task: non-zero exit (137): do…"
7po4yx56mym8 es_kibana.1 kibana:latest zenbook Running Running 12 minutes ago
23fvsrj3lkm9 es_ingest.1 itzg/elasticsearch:latest zenbook Shutdown Failed 10 minutes ago "task: non-zero exit (137): do…"
weiiqw4yh1kv es_gateway.1 itzg/elasticsearch:latest zenbook Shutdown Failed 10 minutes ago "task: non-zero exit (137): do…"
w3rvxnuwebja es_data.1 itzg/elasticsearch:latest zenbook Shutdown Failed 10 minutes ago "task: non-zero exit (137): do…"
pyj218h001s0 es_master.1 itzg/elasticsearch:latest zenbook Running Running 10 minutes ago
zrr2oqbutvnc es_data.2 itzg/elasticsearch:latest zenbook Running Running 7 minutes ago
vnee1y4u07rx \_ es_data.2 itzg/elasticsearch:latest zenbook Shutdown Failed 8 minutes ago "task: non-zero exit (137): do…"
pybwyz713haq \_ es_data.2 itzg/elasticsearch:latest zenbook Shutdown Failed 10 minutes ago "task: non-zero exit (137): do…"
For experimenting, you could instead try this minimal composition that I just pushed. It's not really making much use of Swarm, but eliminates a lot of moving parts.
Thanks for getting back to me 🙌
Here are the details:
root@docker-1:~# docker version
Client:
Version: 17.03.1-ce
API version: 1.27
Go version: go1.7.5
Git commit: c6d412e
Built: Mon Mar 27 17:14:09 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.1-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: c6d412e
Built: Mon Mar 27 17:14:09 2017
OS/Arch: linux/amd64
Experimental: false
root@docker-1:~#
root@docker-1:~# uname -a
Linux docker-1 4.9.20-std-1 #1 SMP Tue Apr 4 12:56:17 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@docker-1:~#
root@docker-1:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial
root@docker-1:~#
I just tried that config you posted, output below :/
root@docker-1:~# docker stack ps es
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
20l649fh70hm es_kibana.1 kibana:latest docker-2 Running Running 2 minutes ago
5g2l47givix3 es_master.1 itzg/elasticsearch:latest docker-1 Running Running 2 minutes ago
Finding IPs. found! 10.255.0.6,172.18.0.3,10.0.0.3
Starting Elasticsearch with the options -E path.conf=/conf -E path.data=/data -E path.logs=/data -E transport.tcp.port=9300 -E http.port=9200 -E network.host=10.255.0.6,172.18.0.3,10.0.0.3 -E discovery.zen.ping.unicast.hosts=master -E discovery.zen.minimum_master_nodes=1
Running as non-root...
[2017-06-29T18:15:02,398][INFO ][o.e.n.Node ] [] initializing ...
[2017-06-29T18:15:02,746][INFO ][o.e.e.NodeEnvironment ] [MidDlKN] using [1] data paths, mounts [[/data (/dev/vda)]], net usable_space [40.2gb], net total_space [45.7gb], spins? [possibly], types [ext4]
[2017-06-29T18:15:02,749][INFO ][o.e.e.NodeEnvironment ] [MidDlKN] heap size [981.5mb], compressed ordinary object pointers [true]
[2017-06-29T18:15:02,756][INFO ][o.e.n.Node ] node name [MidDlKN] derived from node ID [MidDlKN6QXOnHaPvHUFrdA]; set [node.name] to override
[2017-06-29T18:15:02,757][INFO ][o.e.n.Node ] version[5.4.2], pid[20], build[929b078/2017-06-15T02:29:28.122Z], OS[Linux/4.9.20-std-1/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_121/25.121-b13]
[2017-06-29T18:15:02,759][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -Des.path.home=/usr/share/elasticsearch-5.4.2]
[2017-06-29T18:15:08,005][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [aggs-matrix-stats]
[2017-06-29T18:15:08,005][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [ingest-common]
[2017-06-29T18:15:08,009][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [lang-expression]
[2017-06-29T18:15:08,010][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [lang-groovy]
[2017-06-29T18:15:08,013][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [lang-mustache]
[2017-06-29T18:15:08,014][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [lang-painless]
[2017-06-29T18:15:08,015][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [percolator]
[2017-06-29T18:15:08,016][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [reindex]
[2017-06-29T18:15:08,017][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [transport-netty3]
[2017-06-29T18:15:08,017][INFO ][o.e.p.PluginsService ] [MidDlKN] loaded module [transport-netty4]
[2017-06-29T18:15:08,022][INFO ][o.e.p.PluginsService ] [MidDlKN] no plugins loaded
[2017-06-29T18:15:13,732][INFO ][o.e.d.DiscoveryModule ] [MidDlKN] using discovery type [zen]
[2017-06-29T18:15:15,977][INFO ][o.e.n.Node ] initialized
[2017-06-29T18:15:15,979][INFO ][o.e.n.Node ] [MidDlKN] starting ...
[2017-06-29T18:15:16,118][INFO ][i.n.u.i.PlatformDependent] Your platform does not provide complete low-level API for accessing direct buffers reliably. Unless explicitly requested, heap buffer will always be preferred to avoid potential system instability.
[2017-06-29T18:15:16,602][INFO ][o.e.t.TransportService ] [MidDlKN] publish_address {10.0.0.3:9300}, bound_addresses {172.18.0.3:9300}, {10.0.0.3:9300}, {10.255.0.6:9300}
[2017-06-29T18:15:16,635][INFO ][o.e.b.BootstrapChecks ] [MidDlKN] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-06-29T18:15:16,777][WARN ][o.e.d.z.UnicastZenPing ] [MidDlKN] failed to resolve host [master]
java.net.UnknownHostException: master: Name does not resolve
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.8.0_121]
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[?:1.8.0_121]
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[?:1.8.0_121]
at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[?:1.8.0_121]
at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_121]
at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_121]
at org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:922) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:877) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:674) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.discovery.zen.UnicastZenPing.lambda$null$0(UnicastZenPing.java:213) ~[elasticsearch-5.4.2.jar:5.4.2]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_121]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.4.2.jar:5.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
[2017-06-29T18:15:19,889][INFO ][o.e.c.s.ClusterService ] [MidDlKN] new_master {MidDlKN}{MidDlKN6QXOnHaPvHUFrdA}{qBCxNqNmTw-tXfTNUZcMfA}{10.0.0.3}{10.0.0.3:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-06-29T18:15:20,045][INFO ][o.e.h.n.Netty4HttpServerTransport] [MidDlKN] publish_address {10.0.0.3:9200}, bound_addresses {172.18.0.3:9200}, {10.0.0.3:9200}, {10.255.0.6:9200}
[2017-06-29T18:15:20,059][INFO ][o.e.n.Node ] [MidDlKN] started
[2017-06-29T18:15:20,137][INFO ][o.e.g.GatewayService ] [MidDlKN] recovered [0] indices into cluster_state
Just upgraded to Docker 17.06.0-ce
and getting the same problem. The root of the issue seems to be failed to resolve host [master]
. Oddly, I can exec
into a master
container and successfully ping master
. The Kibana container is complaining that it can't reach master
too (confirmed via exec ping)
Hmm, your container is getting assigned a third 10.255.. IP address, but that might just be a coincidence. I need to get my multi-node cluster up and running again to confirm there's not a subtle, but important difference there.
@developius , sorry took longer than I wanted to get my 3-node swarm going again. Well...good news is...I see the same "master: Name does not resolve" as you. Perhaps an additional, private overlay network is needed within the es stack/composition. I'll poke around.
Awesome to hear that it's not just me, thanks!
...even though I see that, the kibana service did start and find the master ES node successfully. I'm also adding an ES data node per swarm node using this compose file:
https://gist.github.com/itzg/a185d87e4e1a888b9bdd45b7aa55ce19#file-docker-compose-yml
Now my only challenge is squeezing these into 1GB VMs :)
To trim down memory usage (esp for memory-constrained test/demo scenarios), I pushed an update to the image that adds a NON_DATA
node type. With that this is the stack that is now working for me:
https://github.com/itzg/dockerfiles/blob/master/elasticsearch/docker-compose-3x1GB.yml
Showing:
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
82oj1p357rha es_data.o3d9426rs2kpe9scco2mg3psm itzg/elasticsearch:latest rack2 Running Running about a minute ago
mfv8qbg22ozk es_data.n0afhmdvj1b5jyhuca4gt51ne itzg/elasticsearch:latest rack1 Running Running about a minute ago
53z77f00ryti es_data.lvfj3wx74d635ypixj3dtgw7g itzg/elasticsearch:latest rack3 Running Running about a minute ago
x8gkfworxvee es_kibana.1 kibana:latest rack1 Running Running 2 minutes ago
5kxafikf23n4 es_master.1 itzg/elasticsearch:latest rack3 Running Running about a minute ago
I just tried your config on two VMs, and I'm getting this error in the data
containers:
not enough master nodes discovered during pinging (found [[]], but needed [-1]), pinging again
Kibana is not starting up either, with this error:
Unable to revive connection: http://master:9200
And finally, the master container:
failed to resolve host [master]
java.net.UnknownHostException: master: Name does not resolve
Strange, the overlay network name resolution is acting differently for you. For sanity testing, do these cross-pinging services resolve names correctly for you:
version: '3'
services:
first:
image: alpine:3.5
command: sh -c "sleep 5 ; ping second"
second:
image: alpine:3.5
command: sh -c "sleep 5 ; ping first"
master:
image: alpine:3.5
command: sh -c "sleep 5 ; ping gateway"
gateway:
image: alpine:3.5
command: sh -c "sleep 5 ; ping master"
Yep that works, although I'm getting this error intermittently (no idea what to do about it) but I think it's got something to do with it (that node is the second in the swarm).
$ docker service logs ping_gateway
error from daemon in stream: Error grabbing logs: rpc error: code = 2 desc = warning: incomplete log stream. some logs could not be retrieved for the following reasons: node z8noyw1ircju77fxmxn8tliue is not available
Thanks for checking. Hmm, must be something induced by the way elasticsearch is doing hostname resolution via Java. I'll do some more thinking.
I came across this thread the other day when looking into another issue with one of my services, and it looks like this is the culprit! scaleway/image-ubuntu#78
Basically, there are some kernel modules missing for ubuntu on scaleway VPSes (what I'm using) which are causing problems with swarm networking. Having changed the bootscript to use the rancher
kernel, everything started working.
My apologies for a false alarm!
Excellent. Glad to hear there was a logical reason for it.
Hi,
I'm trying to get Elasticsearch running on my own two-node Docker Swarm and am running into a problem. I've followed your guide at https://hub.docker.com/r/itzg/elasticsearch/ using the sample docker-compose.yml and this command:
docker stack deploy -c docker-compose.yml es
When inspecting the tasks running, I get this in the error logs:
I'm also finding that the Kibana web UI is not loading at
docker-host-ip:5601
- is this related?I've never used ES before, but I know my way around swarm (I think!). Please could you give me a hand? Thanks!