openeventdata / mordecai

Full text geoparsing as a Python library
MIT License
738 stars 97 forks source link

404 over http://localhost:9200/geonames #56

Closed janecz-n closed 6 years ago

janecz-n commented 6 years ago

ANSWER THERE: https://github.com/openeventdata/mordecai/issues/56#issuecomment-402431815

Hey everyone,

I just installed mordecai. Everything run smooth until I need to launch mordecai.

$ python3 $ from mordecai import Geoparser Using TensorFlow backend. $ geo = Geoparser() GET http://localhost:9200/geonames/_count [status:404 request:0.011s] Traceback (most recent call last): File "/home/morty/.local/lib/python3.6/site-packages/mordecai/geoparse.py", line 56, in init self.conn.count() File "/home/morty/.local/lib/python3.6/site-packages/elasticsearch_dsl/search.py", line 587, in count *self._params File "/home/morty/.local/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped return func(args, params=params, **kwargs) File "/home/morty/.local/lib/python3.6/site-packages/elasticsearch/client/init.py", line 1123, in count doc_type, '_count'), params=params, body=body) File "/home/morty/.local/lib/python3.6/site-packages/elasticsearch/transport.py", line 312, in perform_request status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout) File "/home/morty/.local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request self._raise_error(response.status, raw_data) File "/home/morty/.local/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) elasticsearch.exceptions.NotFoundError: TransportError(404, 'index_not_found_exception', 'no such index')

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/home/morty/.local/lib/python3.6/site-packages/mordecai/geoparse.py", line 61, in init "for instructions on setting up Geonames/Elasticsearch") ConnectionError: [Errno Could not establish contact with Elasticsearch at localhost on port 9200. Are you sure it's running? ] Mordecai needs access to the Geonames/Elasticsearch gazetteer to function.: 'See https://github.com/openeventdata/mordecai#installation-and-requirements'

With a docker run I get:

e9227f1162e3 elasticsearch:5.5.2 "/docker-entrypoint.…" 4 minutes ago Up 4 minutes 127.0.0.1:9200->9200/tcp, 9300/tcp serene_khorana

and I can access http://localhost:9200 with my browser When I try to go to http://localhost:9200/geonames I get a 404

Any lead to where/what to look at ?

domeniconappo commented 6 years ago

Hi @janecz-n Can you open http://localhost:9200/_cat/indices?v ? The output should be

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   geonames 23vFz20STbudmqktmHVOLg   1   1   11139265            0      2.7gb 

Otherwise, as my experience, this kind of problems could reside in file permissions for the user that docker engine uses under the hood... On my Ubuntu VM, I had to assign ownership to systemd-resolve:systemd-timesync (!)

drwxr-xr-x 3 systemd-resolve systemd-timesync 4096 apr  4 14:04 geonames_index/
janecz-n commented 6 years ago

Hi @domeniconappo

http://localhost:9200/_cat/indices?v give me an empty result (only description row):

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

I tried to change file permissions but I get no much result.

Also, I checked and I don't have a repository "/usr/share/elasticsearch" is that a normal behavior ?

Actually on Unbuntu 18.04

ahalterman commented 6 years ago

My guess is that something's going wrong with the location of the volume when you start Elasticsearch. Try changing the $(pwd) in docker run -d -p 127.0.0.1:9200:9200 -v $(pwd)/geonames_index/:/usr/share/elasticsearch/data elasticsearch:5.5.2 to the full path to the location of the index volume you downloaded.

domeniconappo commented 6 years ago

@janecz-n @ahalterman Hi all, I totally forgot that I had to prepare some config files and a Dockerfile for elasticsearch image to make it working. And I've installed it on several machines and it works well and this problem (I had the same when I've started to use this project) went away.

@janecz-n Have a look here: https://github.com/domeniconappo/SMFR/tree/master/geonames

Basically, instead of running directly the elasticsearch:5.5.2 image, I've prepared my own dockerfile to add some configurations to ES. I think you can do that with volumes options instead of preparing a new Dockerfile (but I prefer this way).

Anyway, the most relevant configuration is here: https://github.com/domeniconappo/SMFR/blob/master/geonames/elasticsearch.yml#L3

But honestly this was working after tens of several attempts and I'm not sure it solves your issue. But just give it a try to verify and let me know, please.

So download the docker file and config files from https://github.com/domeniconappo/SMFR/blob/master/geonames,, build and tag a new image out of it, and run that image instead of the official elasticsearch (that comes with default config).

ahalterman commented 6 years ago

Thanks, @domeniconappo! I've passed in options from the command line (e.g. docker run -d -p 127.0.0.1:9200:9200 -e ES_JAVA_OPTS="-Xms8g -Xmx8g" -e "bootstrap.memory_lock=true" -v $(pwd)/geonames_index/:/usr/share/elasticsearch/data elasticsearch:5.5.2) but your way is much more elegant.

domeniconappo commented 6 years ago

@ahalterman Thanks Andy:) I prefer short commands and configuration in files. And for our docker-compose based system the approach is more convenient.

janecz-n commented 6 years ago

Didn't manage to make it work. With Dockerfile nor options in command line. I got the same result as before.

But still, thanks for your help !

P.S: I'm really not into Docker and I try to avoid it as much as I can :p

ahalterman commented 6 years ago

You certainly don't need to use Docker! All Mordecai needs is the location and port of a running ES instance with the Geonames index. That can certainly be a local installation of Elasticsearch if you'd prefer that approach.

If you do want to keep going, my next step would be to attach to the running ES container and look in /usr/share/elasticsearch/data to see if anything's there. If not, it's a problem with how you're attaching the data volume.

janecz-n commented 6 years ago

@ahalterman I will try a bit more with docker before switching.

So I use the command line from @domeniconappo docker run -d -p 127.0.0.1:9200:9200 -e ES_JAVA_OPTS="-Xms8g -Xmx8g" -e "bootstrap.memory_lock=true" -v $(pwd)/geonames_index/:/usr/share/elasticsearch/data elasticsearch:5.5.2

I tried to check /usr/share/elasticsearch but I don't have this directory.

I didn't see any thing related in the installation process, we agree that this directory should be create automatically ?

BTW:

ls $(pwd)/geonames_index/ nodes

domeniconappo commented 6 years ago

@janecz-n Hi, there is a misunderstanding :) That was not my suggestion. BTW, try the following steps:

  1. Create an ES configuration file (call it elasticsearch.yaml), put this content inside and save it:
    cluster.name: mycluster
    node.name: mynode
    action.auto_create_index: -l*,+z*
    network.host: 0.0.0.0
    path:
    data: /usr/share/elasticsearch/data/
    logs: /var/log/elasticsearch/
  2. Run the elasticsearch:5.5.2 with the following command (remember to change the paths of -v option on the left of :, accordingly to your local paths)

docker run -d -p 127.0.0.1:9200:9200 -e ES_JAVA_OPTS="-Xms8g -Xmx8g" -e "bootstrap.memory_lock=true" -v /full_path/to/your/geonames_index:/usr/share/elasticsearch/data -v /full_path/to/elasticsearch.yaml:/usr/share/elasticsearch/config/elasticsearch.yaml elasticsearch:5.5.2

About the -v option of docker run command: This is a mapping between a local path (on the left side of ':') on your host computer and the path on the right, which is internal to the container and should already be there. You don't have to deal with it.

Alternatively, you can

  1. copy all files (Dockerfile and other configuration) from here in a local folder on your host: https://github.com/domeniconappo/SMFR/tree/master/geonames

Please, change these three paths in the Dockerfile https://github.com/domeniconappo/SMFR/blob/master/geonames/Dockerfile#L12-L14

from geonames/elasticsearch.yaml to ./elasticsearch.yaml so that the build can find local files...

  1. Go into that folder and build the image (watch out the point at the end of the command..that means current directory and it's used by Docker to find the Dockerfile): docker build -t mygeonames .

  2. Run the image with the following command docker run -v /full_path/to/your/geonames_index:/var/lib/elasticsearch/elasticsearch mygeonames

Let him warming up some seconds and then go to this url to check if the index is there: http://localhost:9200/_cat/indices?v

Let us know and good luck! d

janecz-n commented 6 years ago

I tried first the second method with the Dockerfile and configurations file from your repo. Everything run smooth until I docker run: Exception: java.security.AccessControlException thrown from the UncaughtExceptionHandler in thread "Thread-2"

I tried to:

Not much result :/

Then, I have delete every docker images and try the first method, the container start and then crash with Exception: Exception in thread "main" SettingsException[multiple settings files found with suffixes: .yaml,.yml]

domeniconappo commented 6 years ago

Correct the run command of the first method. ES complains that it founds both elasticsearch.yaml and elasticsearch.yml

just try this one and see what happens: docker run -d -p 127.0.0.1:9200:9200 -e ES_JAVA_OPTS="-Xms8g -Xmx8g" -e "bootstrap.memory_lock=true" -v /full_path/to/your/geonames_index:/usr/share/elasticsearch/data -v /full_path/to/elasticsearch.yaml:/usr/share/elasticsearch/config/elasticsearch.yml elasticsearch:5.5.2

janecz-n commented 6 years ago

Hey @domeniconappo !

Now the docker start and ES look like it start correctly. I have checked http://localhost:9200/_cat/indices?v but nothing is listed (even after few minutes). Logs of docker give that:

[INFO ][o.e.n.Node               ] [mynode] initializing ...
[INFO ][o.e.e.NodeEnvironment    ] [mynode] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/nvme0n1p4)]], net usable_space [364.5gb], net total_space [418.4gb], spins? [possibly], types [ext4]
[INFO ][o.e.e.NodeEnvironment    ] [mynode] heap size [7.9gb], compressed ordinary object pointers [true]
[INFO ][o.e.n.Node               ] [mynode] node name [mynode], node ID [vLQa5cmNQPmPsFPLWLu-2Q]
[INFO ][o.e.n.Node               ] [mynode] version[5.5.2], pid[1], build[b2f0c09/2017-08-14T12:33:14.154Z], OS[Linux/4.15.0-24-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_141/25.141-b15]
[INFO ][o.e.n.Node               ] [mynode] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Xms8g, -Xmx8g, -Des.path.home=/usr/share/elasticsearch]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [aggs-matrix-stats]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [ingest-common]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [lang-expression]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [lang-groovy]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [lang-mustache]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [lang-painless]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [parent-join]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [percolator]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [reindex]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [transport-netty3]
[INFO ][o.e.p.PluginsService     ] [mynode] loaded module [transport-netty4]
[INFO ][o.e.p.PluginsService     ] [mynode] no plugins loaded
[INFO ][o.e.d.DiscoveryModule    ] [mynode] using discovery type [zen]
[WARN ][o.e.c.u.IndexFolderUpgrader] [/usr/share/elasticsearch/data/nodes/0/indices/eWVK3y2ETaufWKEFZmeK2Q] no index state found - ignoring
[INFO ][o.e.n.Node               ] [mynode] initialized
[INFO ][o.e.n.Node               ] [mynode] starting ...
[INFO ][o.e.t.TransportService   ] [mynode] publish_address {172.17.0.2:9300}, bound_addresses {0.0.0.0:9300}
[INFO ][o.e.b.BootstrapChecks    ] [mynode] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[INFO ][o.e.c.s.ClusterService   ] [mynode] new_master {mynode}{vLQa5cmNQPmPsFPLWLu-2Q}{bGpAtieuTKi2BhCvgF7WgA}{172.17.0.2}{172.17.0.2:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[INFO ][o.e.h.n.Netty4HttpServerTransport] [mynode] publish_address {172.17.0.2:9200}, bound_addresses {0.0.0.0:9200}
[INFO ][o.e.n.Node               ] [mynode] started
[INFO ][o.e.g.GatewayService     ] [mynode] recovered [0] indices into cluster_state

So it look like there is nothing interesting to get from that. What's your opinion ?

domeniconappo commented 6 years ago

@janecz-n I see this line in the log:

[WARN ][o.e.c.u.IndexFolderUpgrader] [/usr/share/elasticsearch/data/nodes/0/indices/eWVK3y2ETaufWKEFZmeK2Q] no index state found - ignoring

Could you check if your corresponding local geonames_index folder has that path and a _state folder in it and if the state file is zero-length?

Could you also have a look at this thread? https://github.com/elastic/elasticsearch/issues/27007

Also, could you just try if giving chmod -R a+w to your geonames_index folder solves this (not really sure)?

janecz-n commented 6 years ago

Ok now it's working !

For the last step I just delete and download again geonames_index.tar.gz (because I had change permissions)

So for me the full process to run a working ES docker is:

docker pull elasticsearch:5.5.2
wget https://s3.amazonaws.com/ahalterman-geo/geonames_index.tar.gz --output-file=wget_log.txt
tar -xzf geonames_index.tar.gz
nano elasticsearch.yaml

Paste this inside:

cluster.name: mycluster
node.name: mynode
action.auto_create_index: -l*,+z*
network.host: 0.0.0.0
path:
  data: /usr/share/elasticsearch/data/
  logs: /var/log/elasticsearch/

Then start docker with this command:

docker run -d -p 127.0.0.1:9200:9200 -e ES_JAVA_OPTS="-Xms8g -Xmx8g" -e "bootstrap.memory_lock=true" -v $(pwd)/geonames_index:/usr/share/elasticsearch/data -v $(pwd)/elasticsearch.yaml:/usr/share/elasticsearch/config/elasticsearch.yml elasticsearch:5.5.2

After a minute, you can find a running instance of geonames listed in: http://localhost:9200/_cat/indices?v

janecz-n commented 6 years ago

@domeniconappo / @ahalterman

I'm very grateful for your time and kindness ! Thanks.

domeniconappo commented 6 years ago

@janecz-n thanks to you for sharing the working solution!

janecz-n commented 6 years ago

Btw if you know a better name for the issue, I can edit it.

domeniconappo commented 6 years ago

@janecz-n that's quite a challenge :) I'm not really sure and nor I'm an ES expert. I run through all those very same problems some months ago...I think the issue is related to the missing index and the auto creation option....

akshaylakhe commented 3 years ago

This issue is related to default memory configured for docker.I reduced from ES_JAVA_OPTS="-Xms8g -Xmx8g" to ES_JAVA_OPTS="-Xms750m -Xmx750m". and its working. from this issue: https://github.com/10up/wp-local-docker/issues/6