Closed audiodude closed 8 years ago
What version of ElasticSearch are you using? It could be we're not forward-compatible in some way.
What do you get from the output of this command? curl http://localhost:9200/oversight_read/reports/_mapping?pretty
$ elasticsearch --version
Version: 2.2.1, Build: d045fc2/2016-03-09T09:38:54Z, JVM: 1.7.0_71
$ curl http://localhost:9200/oversight_read/reports/_mapping?pretty
{ }
Hmm, so rake elasticsearch:init
created the index, but didn't set up the mapping properly. I'll try testing against ES 2.2 and see what I come up with.
Okay going back to ES 1.7, I had trouble starting elasticsearch. I had to delete my elasticsearch data directory because it wasn't backwards compatible. Then I rerun the rake elasticsearch:*
commands.
Now I can search for "food" and "agency" but there are no results for either.
I have:
$ ./tasks/inspectors.js
Loading all reports since 2016 from data.
Loading all reports since 2016 from ../reports/inspectors-general.
Then I thought maybe that there are no 2016 reports in unitedstates/reports
, so I ran:
$ ./tasks/inspectors.js --since=2010
which had much better output:
$ ./tasks/inspectors.js --since=2010
Loading all reports since 2010 from data.
Loading all reports since 2010 from ../reports/inspectors-general.
[nsa][2010][CY2010_Annual_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2010][FY2010_1Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2010][FY2010_2Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2010][FY2010_3Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2010][FY2010_4Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2011][FY2011_1Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2011][FY2011_2Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2011][FY2011_3Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2011][FY2011_4Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2012][FY2012_1Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2012][FY2012_2Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2012][FY2012_3Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2012][FY2012_4Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2013][FY2013_1Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[nsa][2013][FY2013_2Q_IOB_Report]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
[state][2011][162347]
Loading JSON from disk...
Loading text from disk...
Indexing into Elasticsearch...
Refreshing index.
All done.
But searches for "food", "agency" and "people" all have 0 results.
While installing ES 2.2.1, I was reminded that I had previously set the default number_of_shards/number_of_replicas values on my cluster. This allows the indices' "health" to go from yellow to green, since it stops waiting for another node to join the cluster. We should document this somewhere, once I look up how to do it again. :P This may or may not explain the original problem.
Edit: I uncommented the following in the elasticsearch.yml configuration file. This file lives in the config
directory if you installed from an archive, or at /etc/elasticsearch/elasticsearch.yml
if you installed from a package manager. After this change, I was able to get indices to go from yellow status to green status.
index.number_of_shards: 1
index.number_of_replicas: 0
Okay, I was able to reproduce and fix the issue on my machine, using ES 2.2.1. If I just start everything up and run the test suite, I get the following error.
$ tasks/tests
rake aborted!
Elasticsearch::Transport::Transport::Errors::RequestTimeout: [408] {"cluster_name":"elasticsearch","status":"yellow","timed_out":true,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":5,"active_shards":5,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":5,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}
tasks/elasticsearch.rake:33:in `block (2 levels) in <top (required)>'
Tasks: TOP => elasticsearch:init
(See full trace by running task with --trace)
If I add index.number_of_shards: 1
and index.number_of_replicas: 0
to config/elasticsearch.yml
and retry, then everything works. I think the root cause here is that the semantics of cluster health and timeouts changed between v1.5 and v2.2. With ES 1.5, everything chugs along happily if all the indices are yellow, i.e. there are enough primary shards, but not enough replica shards. You can check index health by running curl http://127.0.0.1:9200/_cat/indices?v
. With ES 2.2, after the rake task creates the index, it waits for the cluster to reach "green" health. This API call ends up raising an exception, because it times out without the cluster reaching "green". Compare this to ES 1.5, where the same /_cluster/health?wait_for_status=green
API call times out after 30 seconds, but does not result in raising an exception.
I'm going to work around this by changing the rake tasks to only wait for yellow health. This will preserve correctness by getting out of red health before closing a newly-created index, improve performance by cutting out unnecessary 30-second delays when running ES 1.5, and hopefully fix the problem @audiodude encountered with ES 2.2, without requiring extra configuration (of elasticsearch.yml) at installation time.
@audiodude could you give a5d5b99 a try? Delete the ES data directory again, do a git pull, and try walking through the setup instructions again. I think that elasticsearch:init
task was previously failing halfway through, without setting up the text analyzer needed for full-text search. With these changes, elasticsearch:init
ought to run without throwing an exception, and hopefully you'll get sensible search results at the end. Hope this helps, and thanks for your patience!
@divergentdave Should I try it with ES 2.2 or ES 1.7 (which I currently have installed)?
Okay it worked on ES 1.7
Great to hear!
Following the setup instructions, after running the rake elasticsearch:init commands.
I try a search for "food" and get the following:
I've softwrapped the exception it was originally on one line.
Here's some debug output: