molgenis / docker

Dockerfiles and docker-compose for MOLGENIS
GNU Lesser General Public License v3.0
3 stars 17 forks source link

In docker-compose, molgenis 10.x fails to connect to Elasticsearch #103

Open mobilegenome opened 1 year ago

mobilegenome commented 1 year ago

We're trying to test the new Molgenis version from the provided docker-compose file in this repository but can't get it to work.

All containers seem to start up nominally, but then the molgenis container keeps throwing the following error:

molgenis_1          | 2023-02-08_10:48:54.686 [main] ERROR o.m.d.e.client.ClientFactory - Failed to connect to Elasticsearch cluster on http://host.docker.internal:9200. Retry count = 2
molgenis_1          | 2023-02-08_10:48:58.689 [main] ERROR o.m.d.e.client.ClientFactory - Failed to connect to Elasticsearch cluster on http://host.docker.internal:9200. Retry count = 3
molgenis_1          | 2023-02-08_10:49:06.695 [main] ERROR o.m.d.e.client.ClientFactory - Failed to connect to Elasticsearch cluster on http://host.docker.internal:9200. Retry count = 4
molgenis_1          | 2023-02-08_10:49:22.701 [main] ERROR o.m.d.e.client.ClientFactory - Failed to connect to Elasticsearch cluster on http://host.docker.internal:9200. Retry count = 5

At this point, further initialisation of molgenis stalls, and the frontend is not reachable in the browser.

We starting the containers with docker-compose up and tried this on different machines using the latest docker-compose version 2.15. We could not find any hints or documentation what configuration needs to be adjusted to let molgenis connect with elastic search. Do you have any advice?

tommydeboer commented 1 year ago

Hi @mobilegenome, which version are you trying to start? We have tried locally (on a mac) with 10.1 and it starts without issues.

mobilegenome commented 1 year ago

I also tried 10.1. What other properties of my system are there that could affect the behaviour? I'm running Ubuntu 18.04

It neither works on a CentOS 7.9 in our OpenStack cloud - here we observe the following log from elasticsearch:

101-elasticsearch-1  | {"type": "server", "timestamp": "2023-02-08T15:19:51,424Z", "level": "ERROR", "component": "o.e.i.g.GeoIpDownloader", "cluster.name": "molgenis", "node.name": "f006cc49c503", "message": "exception during geoip databases update", "cluster.uuid": "pgrUxA6_SO-qPrd08SjVgw", "node.id": "XNz9eemkQcS9v6FsrE4fDw" ,
101-elasticsearch-1  | "stacktrace": ["java.net.SocketTimeoutException: Connect timed out",
101-elasticsearch-1  | "at sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[?:?]",
101-elasticsearch-1  | "at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[?:?]",
101-elasticsearch-1  | "at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:333) ~[?:?]",
101-elasticsearch-1  | "at java.net.Socket.connect(Socket.java:645) ~[?:?]",
101-elasticsearch-1  | "at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:300) ~[?:?]",
101-elasticsearch-1  | "at sun.net.NetworkClient.doConnect(NetworkClient.java:177) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.http.HttpClient.openServer(HttpClient.java:497) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.http.HttpClient.openServer(HttpClient.java:600) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:265) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:379) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:189) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1232) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1120) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:175) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1653) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1577) ~[?:?]",
101-elasticsearch-1  | "at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527) ~[?:?]",
101-elasticsearch-1  | "at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:308) ~[?:?]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.HttpClient.lambda$get$0(HttpClient.java:55) ~[ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at java.security.AccessController.doPrivileged(AccessController.java:554) ~[?:?]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.HttpClient.doPrivileged(HttpClient.java:97) ~[ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.HttpClient.get(HttpClient.java:49) ~[ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.HttpClient.getBytes(HttpClient.java:40) ~[ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.GeoIpDownloader.fetchDatabasesOverview(GeoIpDownloader.java:115) ~[ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.GeoIpDownloader.updateDatabases(GeoIpDownloader.java:103) ~[ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.GeoIpDownloader.runDownloader(GeoIpDownloader.java:235) [ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:94) [ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.ingest.geoip.GeoIpDownloaderTaskExecutor.nodeOperation(GeoIpDownloaderTaskExecutor.java:43) [ingest-geoip-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.persistent.NodePersistentTasksExecutor$1.doRun(NodePersistentTasksExecutor.java:40) [elasticsearch-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737) [elasticsearch-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.15.0.jar:7.15.0]",
101-elasticsearch-1  | "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
101-elasticsearch-1  | "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
101-elasticsearch-1  | "at java.lang.Thread.run(Thread.java:831) [?:?]"] }
erikschaberg commented 1 year ago

On our Macs we run: Docker version 20.10.21.

mobilegenome commented 1 year ago

Here it's Docker version 20.10.22, build 3a2c30b - should not make the difference.

I've a feeling it's elasticsearch's GeoIpDownloader not connecting and causing ES to crash. Could that be?

erikschaberg commented 1 year ago

and on Ubuntu 22.04 the docker version is also high: version 20.10.17. Please use a modern docker version (and modern Ubuntu version)

Centos 7 has a very old docker: docker -v Docker version 1.13.1, build 7d71120/1.13.1 on CentOS Linux release 7.9.2009 (Core)

tommydeboer commented 1 year ago

@mobilegenome Could you try adding - ingest.geoip.downloader.enabled=false in the docker-compose file under elasticsearch.environment?

mobilegenome commented 1 year ago

Thanks @tommydeboer! When doing this elasticsearch does not crash anymore, but the backend is still not able to connect showing the same error as in the first post.

Also our VM runs a modern docker with version 20.10.23.

mobilegenome commented 1 year ago

Okay, some more information from our side:

I started docker-compose.yml which runs until, ES is not crashing, but the molgenis backend can't connect via host.docker.internal:9200.

However, if I spin up another docker container and try connect to localhost:9200, ES is sucessfully responding. Can we tell molgenis to use localhost instead of host.docker.internal?

Here's the log:

❯ sudo docker run -it --rm --network=host arunvelsriram/utils bash                                                                                                                                 
utils@fritjofx1-dkfz:~$ curl host.docker.internal
curl: (6) Could not resolve host: host.docker.internal
utils@fritjofx1-dkfz:~$ ping host.docker.internal
ping: host.docker.internal: Name or service not known
utils@fritjofx1-dkfz:~$ curl localhost:9200
{
  "name" : "6880da5c39a8",
  "cluster_name" : "molgenis",
  "cluster_uuid" : "i-pVnMZvTrSetDGWVZr2Jw",
  "version" : {
    "number" : "7.15.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "79d65f6e357953a5b3cbcc5e2c7c21073d89aa29",
    "build_date" : "2021-09-16T03:05:29.143308416Z",
    "build_snapshot" : false,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
utils@fritjofx1-dkfz:~$ curl host.docker.internal:9200
curl: (6) Could not resolve host: host.docker.internal
tommydeboer commented 1 year ago

@mobilegenome You could try changing the elasticsearch.hosts property under molgenis.environment to point to the other elasticsearch container. In the meantime we will try to figure out why it's not working inside the docker-compose network.

mobilegenome commented 1 year ago

Thanks, we will try that!

mobilegenome commented 1 year ago

Hi @tommydeboer,

unfortunately also with this setting the molgenis backend can't establish a connection with the elasticsearch container.

I was wondering if there are any settings in docker or docker-compose that prohibit this, but then we can successfully start the compose file for Molgenis 8.6. which also connects to ES via port 9200, so this might not be a plausible explanation either :thinking:

mobilegenome commented 1 year ago

Any updates from your side?

mobilegenome commented 1 year ago

OK. It seems I have found a solution to this issue. I get molgenis connecting to the ES container by changing the elasticsearch.hosts property to

      - elasticsearch.hosts=elasticsearch:9200

This gets MOLGENIS running on my local computer. On our institute's VMs, however we face an new (old) issue that is, molgenis can not retrieve the npm(?) javascript dependencies from unpkg.com, causing the frontend to timeout and not showing up. We had the same problem with Molgenis 8.6., where it was eventually solved by having @fdlk baking in the dependencies into a custom container.

echarpentier commented 4 months ago

We get the same issue with elasticsearch and also the issue described above. Changing from

molgenis:
    environment:
        - elasticsearch.hosts=host.docker.internal:9200

to

molgenis:
    environment:
        - elasticsearch.hosts=elasticsearch:9200

fixes the problem. However, our institute's VM is not able to resolve the URLs for js and css resources (ex: upstream: "https://104.16.124.175:443/@molgenis-ui/legacy-lib@~1.1/dist/require.js").

How is this URL generated "104.16.124.175:443" in version 10.1? Our sysadmin authorized unpkg.com on the proxy. When trying with the IP adress, we end up with a "403 - Forbidden". It works when replacing with "unpkg.com" (https://unpkg.com/@molgenis-ui/legacy-lib@1.1.5/dist/require.js) Regards !