docker-compose network issue

sykhan88 commented 4 years ago

When running docker-compose.yml, I got the following error:

es | [2020-08-16T20:27:04,978][INFO ][o.e.g.GatewayService ] [Ox0cSfC] recovered [0] indices into cluster_state web_1 | Unable to connect to ES. Retrying in 5 secs... web_1 | Unable to connect to ES. Retrying in 5 secs... web_1 | Unable to connect to ES. Retrying in 5 secs... web_1 | Out of retries. Bailing out... foodtrucks_web_1 exited with code 1.
This is the same error we had earlier in the tutorial which we resolved by creating a new bridge network and running both es and food truck web in the same network. However, looks like even though docker-compose creates a new network by default, we are facing the same issue.
As a result, I wasn't able to launch the app via docker-compose.

richardlin047 commented 4 years ago

I also faced the same issue. When I ran docker-compose up -d and docker network inspect foodtrucks_default, only the es container was shown in the network and not the web container.

It seems this issue was also encountered in https://github.com/prakhar1989/docker-curriculum/issues/80 and no fix was presented. A similar issue was encountered in https://github.com/prakhar1989/docker-curriculum/issues/99, but the fix doesn't apply to this situation. I increased the sleep time for retrying the connection to es and that didn't fix it.

richardlin047 commented 4 years ago

Looks like it was an issue of just increasing time.sleep on line 32 in app.py enough. 10 seconds did not work, but 15 did.

I've included the logs of what it looks like when it works and doesn't work. You'll see that if web exits before es outputs started, then the web container just needs to wait longer before exiting.

docker-compose log snippet of when it doesn't work:

$ docker-compose up ... es | [2020-08-28T06:38:52,241][INFO ][o.e.p.PluginsService ] [dNlKsoY] loaded module [x-pack-upgrade] es | [2020-08-28T06:38:52,243][INFO ][o.e.p.PluginsService ] [dNlKsoY] loaded module [x-pack-watcher] es | [2020-08-28T06:38:52,247][INFO ][o.e.p.PluginsService ] [dNlKsoY] loaded plugin [ingest-geoip] es | [2020-08-28T06:38:52,250][INFO ][o.e.p.PluginsService ] [dNlKsoY] loaded plugin [ingest-user-agent] es | [2020-08-28T06:39:02,327][INFO ][o.e.x.s.a.s.FileRolesStore] [dNlKsoY] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml] es | [2020-08-28T06:39:05,295][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/78] [Main.cc@109] controller (64 bit): Version 6.3.2 (Build 903094f295d249) Copyright (c) 2018 Elasticsearch BV es | [2020-08-28T06:39:07,505][INFO ][o.e.d.DiscoveryModule ] [dNlKsoY] using discovery type [single-node] es | [2020-08-28T06:39:10,283][INFO ][o.e.n.Node ] [dNlKsoY] initialized es | [2020-08-28T06:39:10,284][INFO ][o.e.n.Node ] [dNlKsoY] starting ... web_1 | Unable to connect to ES. Retrying in 5 secs... web_1 | Unable to connect to ES. Retrying in 5 secs... web_1 | Unable to connect to ES. Retrying in 5 secs... web_1 | Out of retries. Bailing out... es | [2020-08-28T06:39:10,920][INFO ][o.e.t.TransportService ] [dNlKsoY] publish_address {172.20.0.2:9300}, bound_addresses {0.0.0.0:9300} foodtrucks_web_1 exited with code 1 es | [2020-08-28T06:39:11,493][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [dNlKsoY] publish_address {172.20.0.2:9200}, bound_addresses {0.0.0.0:9200} es | [2020-08-28T06:39:11,497][INFO ][o.e.n.Node ] [dNlKsoY] started es | [2020-08-28T06:39:13,295][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [dNlKsoY] Failed to clear cache for realms [[]] es | [2020-08-28T06:39:13,361][INFO ][o.e.l.LicenseService ] [dNlKsoY] license [0f9c4d78-79b6-4865-a29e-83b6dfa481df] mode [basic] - valid

docker-compose log snippet of when it does work:

$ docker-compose up ... es | [2020-08-28T06:49:30,024][INFO ][o.e.p.PluginsService ] [dNlKsoY] loaded module [x-pack-watcher] es | [2020-08-28T06:49:30,028][INFO ][o.e.p.PluginsService ] [dNlKsoY] loaded plugin [ingest-geoip] es | [2020-08-28T06:49:30,028][INFO ][o.e.p.PluginsService ] [dNlKsoY] loaded plugin [ingest-user-agent] es | [2020-08-28T06:49:39,370][INFO ][o.e.x.s.a.s.FileRolesStore] [dNlKsoY] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml] es | [2020-08-28T06:49:41,659][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/78] [Main.cc@109] controller (64 bit): Version 6.3.2 (Build 903094f295d249) Copyright (c) 2018 Elasticsearch BV es | [2020-08-28T06:49:43,568][INFO ][o.e.d.DiscoveryModule ] [dNlKsoY] using discovery type [single-node] es | [2020-08-28T06:49:46,176][INFO ][o.e.n.Node ] [dNlKsoY] initialized es | [2020-08-28T06:49:46,177][INFO ][o.e.n.Node ] [dNlKsoY] starting ... es | [2020-08-28T06:49:46,741][INFO ][o.e.t.TransportService ] [dNlKsoY] publish_address {172.20.0.2:9300}, bound_addresses {0.0.0.0:9300} es | [2020-08-28T06:49:46,940][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [dNlKsoY] publish_address {172.20.0.2:9200}, bound_addresses {0.0.0.0:9200} es | [2020-08-28T06:49:46,941][INFO ][o.e.n.Node ] [dNlKsoY] started es | [2020-08-28T06:49:48,522][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [dNlKsoY] Failed to clear cache for realms [[]] es | [2020-08-28T06:49:48,648][INFO ][o.e.l.LicenseService ] [dNlKsoY] license [0f9c4d78-79b6-4865-a29e-83b6dfa481df] mode [basic] - valid es | [2020-08-28T06:49:48,860][INFO ][o.e.g.GatewayService ] [dNlKsoY] recovered [1] indices into cluster_state es | [2020-08-28T06:49:50,011][INFO ][o.e.c.r.a.AllocationService] [dNlKsoY] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[sfdata][1]] ...]). web_1 | Serving Flask app "app" (lazy loading) web_1 | Environment: production web_1 | WARNING: This is a development server. Do not use it in a production deployment. web_1 | Use a production WSGI server instead. web_1 | Debug mode: off web_1 | Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

thund3rb1rd78 commented 2 years ago

Yes, increasing the time.sleep worked for me. But a better solution would be to implement a wait-for-it script, but I'm not versed enough in elasticsearch to suggest anything more than that. see below https://docs.docker.com/compose/startup-order/

dmhd1 commented 2 years ago

I experience the same issue. For me increasing time.sleep did not work either.

What did work is to use a newer image provied by elastics. Using docker.elastic.co/elasticsearch/elasticsearch:7.15.0 for example, works for me.

prakhar1989 / docker-curriculum

docker-compose network issue #268