konklone / oversight.garden

Bringing together the oversight community's work.
https://oversight.garden
Creative Commons Zero v1.0 Universal
26 stars 9 forks source link

[search_parse_exception] No mapping found for [published_on] #107

Closed audiodude closed 8 years ago

audiodude commented 8 years ago

Following the setup instructions, after running the rake elasticsearch:init commands.

I try a search for "food" and get the following:

Elasticsearch DEBUG: 2016-04-16T20:49:56Z
  starting request { method: 'POST',
    path: '/oversight_read/reports/_search',
    body:
     { from: 0,
       size: 10,
       query: { filtered: [Object] },
       sort: [ [Object] ],
       highlight:
        { encoder: 'html',
          pre_tags: [Object],
          post_tags: [Object],
          fields: [Object],
          order: 'score',
          fragment_size: 500 },
       _source:
        [ 'report_id',
          'year',
          'inspector',
          'agency',
          'title',
          'agency_name',
          'url',
          'landing_url',
          'inspector_url',
          'published_on',
          'type',
          'file_type' ] },
    query: {} }

Elasticsearch DEBUG: 2016-04-16T20:49:57Z
  Request complete

Noooo!

[search_parse_exception] No mapping found for [published_on] in order to sort on :: 
{"path":"/oversight_read/reports/_search","query":{},"body":"{\"from\":0,\"size\":10,\"query\":{\"filtered\":
{\"query\":{\"query_string\":{\"query\":\"food\",\"default_operator\":\"AND\",\"use_dis_max\":true,\"fields\":
[\"text\",\"title\",\"summary\",\"pdf.title\",\"pdf.keywords\",\"doc.title\",\"docx.title\",\"docx.keywords\"]}}}},\"
sort\":[{\"published_on\":\"desc\"}],\"highlight\":{\"encoder\":\"html\",\"pre_tags\":[\"<b>\"],\"post_tags\":[\"
</b>\"],\"fields\":{\"*\":{}},\"order\":\"score\",\"fragment_size\":500},\"_source\":
[\"report_id\",\"year\",\"inspector\",\"agency\",\"title\",\"agency_name\",\"url\",\"landing_url\",\"inspector_u
rl\",\"published_on\",\"type\",\"file_type\"]}","statusCode":400,"response":"{\"error\":{\"root_cause\":
[{\"type\":\"search_parse_exception\",\"reason\":\"No mapping found for [published_on] in order to sort 
on\"}],\"type\":\"search_phase_execution_exception\",\"reason\":\"all shards 
failed\",\"phase\":\"query\",\"grouped\":true,\"failed_shards\":[{\"shard\":0,\"index\":\"oversight-2016-04-
16\",\"node\":\"Lt4BRxipTyutMt09RlP8Rg\",\"reason\":
{\"type\":\"search_parse_exception\",\"reason\":\"No mapping found for [published_on] in order to sort 
on\"}}]},\"status\":400}"}

I've softwrapped the exception it was originally on one line.

Here's some debug output:

$ rake elasticsearch:list
oversight-2016-04-16, 2 aliases
  oversight_read
  oversight_write
divergentdave commented 8 years ago

What version of ElasticSearch are you using? It could be we're not forward-compatible in some way.

What do you get from the output of this command? curl http://localhost:9200/oversight_read/reports/_mapping?pretty

audiodude commented 8 years ago
$ elasticsearch --version
Version: 2.2.1, Build: d045fc2/2016-03-09T09:38:54Z, JVM: 1.7.0_71
$ curl http://localhost:9200/oversight_read/reports/_mapping?pretty
{ }
divergentdave commented 8 years ago

Hmm, so rake elasticsearch:init created the index, but didn't set up the mapping properly. I'll try testing against ES 2.2 and see what I come up with.

audiodude commented 8 years ago

Okay going back to ES 1.7, I had trouble starting elasticsearch. I had to delete my elasticsearch data directory because it wasn't backwards compatible. Then I rerun the rake elasticsearch:* commands.

Now I can search for "food" and "agency" but there are no results for either.

I have:

$ ./tasks/inspectors.js
Loading all reports since 2016 from data.
Loading all reports since 2016 from ../reports/inspectors-general.

Then I thought maybe that there are no 2016 reports in unitedstates/reports, so I ran:

$ ./tasks/inspectors.js --since=2010

which had much better output:

$ ./tasks/inspectors.js --since=2010
Loading all reports since 2010 from data.
Loading all reports since 2010 from ../reports/inspectors-general.
[nsa][2010][CY2010_Annual_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2010][FY2010_1Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2010][FY2010_2Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2010][FY2010_3Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2010][FY2010_4Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2011][FY2011_1Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2011][FY2011_2Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2011][FY2011_3Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2011][FY2011_4Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2012][FY2012_1Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2012][FY2012_2Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2012][FY2012_3Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2012][FY2012_4Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2013][FY2013_1Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[nsa][2013][FY2013_2Q_IOB_Report]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
[state][2011][162347]
    Loading JSON from disk...
    Loading text from disk...
    Indexing into Elasticsearch...
Refreshing index.
All done.

But searches for "food", "agency" and "people" all have 0 results.

divergentdave commented 8 years ago

While installing ES 2.2.1, I was reminded that I had previously set the default number_of_shards/number_of_replicas values on my cluster. This allows the indices' "health" to go from yellow to green, since it stops waiting for another node to join the cluster. We should document this somewhere, once I look up how to do it again. :P This may or may not explain the original problem.

Edit: I uncommented the following in the elasticsearch.yml configuration file. This file lives in the config directory if you installed from an archive, or at /etc/elasticsearch/elasticsearch.yml if you installed from a package manager. After this change, I was able to get indices to go from yellow status to green status.

index.number_of_shards: 1
index.number_of_replicas: 0
divergentdave commented 8 years ago

Okay, I was able to reproduce and fix the issue on my machine, using ES 2.2.1. If I just start everything up and run the test suite, I get the following error.

$ tasks/tests
rake aborted!
Elasticsearch::Transport::Transport::Errors::RequestTimeout: [408] {"cluster_name":"elasticsearch","status":"yellow","timed_out":true,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":5,"active_shards":5,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":5,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}
tasks/elasticsearch.rake:33:in `block (2 levels) in <top (required)>'
Tasks: TOP => elasticsearch:init
(See full trace by running task with --trace)

If I add index.number_of_shards: 1 and index.number_of_replicas: 0 to config/elasticsearch.yml and retry, then everything works. I think the root cause here is that the semantics of cluster health and timeouts changed between v1.5 and v2.2. With ES 1.5, everything chugs along happily if all the indices are yellow, i.e. there are enough primary shards, but not enough replica shards. You can check index health by running curl http://127.0.0.1:9200/_cat/indices?v. With ES 2.2, after the rake task creates the index, it waits for the cluster to reach "green" health. This API call ends up raising an exception, because it times out without the cluster reaching "green". Compare this to ES 1.5, where the same /_cluster/health?wait_for_status=green API call times out after 30 seconds, but does not result in raising an exception.

I'm going to work around this by changing the rake tasks to only wait for yellow health. This will preserve correctness by getting out of red health before closing a newly-created index, improve performance by cutting out unnecessary 30-second delays when running ES 1.5, and hopefully fix the problem @audiodude encountered with ES 2.2, without requiring extra configuration (of elasticsearch.yml) at installation time.

divergentdave commented 8 years ago

@audiodude could you give a5d5b99 a try? Delete the ES data directory again, do a git pull, and try walking through the setup instructions again. I think that elasticsearch:init task was previously failing halfway through, without setting up the text analyzer needed for full-text search. With these changes, elasticsearch:init ought to run without throwing an exception, and hopefully you'll get sensible search results at the end. Hope this helps, and thanks for your patience!

audiodude commented 8 years ago

@divergentdave Should I try it with ES 2.2 or ES 1.7 (which I currently have installed)?

audiodude commented 8 years ago

Okay it worked on ES 1.7

divergentdave commented 8 years ago

Great to hear!