karmi / retire

A rich Ruby API and DSL for the Elasticsearch search engine
http://karmi.github.com/retire/
MIT License
1.87k stars 533 forks source link

Multiple facets not working #836

Open NilsHaldenwang opened 11 years ago

NilsHaldenwang commented 11 years ago

Hey there,

I am trying to use multiple facets in a search, but keep getting parse errors from Elastic Search.

This are the tire relevant parts of my model:

class Tweet < ActiveRecord::Base

  include Tire::Model::Search
  include Tire::Model::Callbacks

  index_name "#{Tire::Model::Search.index_prefix}tweets"

  #TODO: When do indices update? on after_save but this is not good
  settings TireConfig::MULTI_LANGUAGE_ANALYSIS_SETTINGS do
    mapping "_analyzer" => { path: 'analyzer_to_use', index: 'no' } do
          indexes :id,           index: :no

          indexes :text

          indexes :user_name

          indexes :user_id

          indexes :entities_ids, as: 'entities_ids'

          indexes :created_at,   type: 'date'

          indexes :sentiment

          indexes :analyzer_to_use, index: :no
    end
  end
end

This is the search I want to perform:

result = Tweet.search do
  query do
    all
  end

  filter :terms, entities_ids: [11]

  facet 'histo_positive' do
    date field: 'created_at', interval: '30s' do
      facet_filter :term, sentiment: "positive"
    end
  end

  facet 'histo_negative' do
    date field: 'created_at', interval: '30s' do
      facet_filter :term, sentiment: "negative"
    end
  end

end

Which results in the following query to be sent to elastic serach:

  "query": {
    "match_all": {

    }
  },
  "facets": {
    "histo_positive": {
      "date_histogram": {
        "field": {
          "field": "created_at",
          "interval": "30s"
        },
        "interval": "day"
      }
    },
    "histo_negative": {
      "date_histogram": {
        "field": {
          "field": "created_at",
          "interval": "30s"
        },
        "interval": "day"
      }
    }
  },
  "size": 10
}'

Which fails with

Tire::Search::SearchRequestFailed: 500 : {"error":"SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[k42X0krVSBCRdL28xNoTjA][tweets][1]: SearchParseException[[tweets][1]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\": {\n    \"match_all\": {\n\n    }\n  },\n  \"facets\": {\n    \"histo_positive\": {\n      \"date_histogram\": {\n        \"field\": {\n          \"field\": \"created_at\",\n          \"interval\": \"30s\"\n        },\n        \"interval\": \"day\"\n      }\n    },\n    \"histo_negative\": {\n      \"date_histogram\": {\n        \"field\": {\n          \"field\": \"created_at\",\n          \"interval\": \"30s\"\n        },\n        \"interval\": \"day\"\n      }\n    }\n  },\n  \"size\": 10\n}]]]; nested: SearchParseException[[tweets][1]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [No parser for element [histo_negative]]]; }{[k42X0krVSBCRdL28xNoTjA][tweets][0]: SearchParseException[[tweets][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\": {\n    \"match_all\": {\n\n    }\n  },\n  \"facets\": {\n    \"histo_positive\": {\n      \"date_histogram\": {\n        \"field\": {\n          \"field\": \"created_at\",\n          \"interval\": \"30s\"\n        },\n        \"interval\": \"day\"\n      }\n    },\n    \"histo_negative\": {\n      \"date_histogram\": {\n        \"field\": {\n          \"field\": \"created_at\",\n          \"interval\": \"30s\"\n        },\n        \"interval\": \"day\"\n      }\n    }\n  },\n  \"size\": 10\n}]]]; nested: SearchParseException[[tweets][0]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [No parser for element [histo_negative]]]; }{[k42X0krVSBCRdL28xNoTjA][tweets][4]: SearchParseException[[tweets][4]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\": {\n    \"match_all\": {\n\n    }\n  },\n  \"facets\": {\n    \"histo_positive\": {\n      \"date_histogram\": {\n        \"field\": {\n          \"field\": \"created_at\",\n          \"interval\": \"30s\"\n        },\n        \"interval\": \"day\"\n      }\n    },\n    \"histo_negative\": {\n      \"date_histogram\": {\n        \"field\": {\n          \"field\": \"created_at\",\n          \"interval\": \"30s\"\n        },\n        \"interval\": \"day\"\n      }\n    }\n  },\n  \"size\": 10\n}]]]; nested: SearchParseException[[tweets][4]: query[ConstantScore(*:*)],from[-1],size[-1]: Parse Failure [No parser for element [histo_negative]]]; }]","status":500}
    from /Users/nils/.rvm/gems/ruby-1.9.3-p429/gems/tire-0.6.0/lib/tire/search.rb:139:in `perform'
    from /Users/nils/.rvm/gems/ruby-1.9.3-p429/gems/tire-0.6.0/lib/tire/search.rb:35:in `results'
    from /Users/nils/.rvm/gems/ruby-1.9.3-p429/gems/tire-0.6.0/lib/tire/model/search.rb:105:in `search'
    from /Users/nils/.rvm/gems/ruby-1.9.3-p429/gems/tire-0.6.0/lib/tire/model/search.rb:298:in `search'
    from (irb):64
    from /Users/nils/.rvm/gems/ruby-1.9.3-p429/gems/railties-4.0.0/lib/rails/commands/console.rb:90:in `start'
    from /Users/nils/.rvm/gems/ruby-1.9.3-p429/gems/railties-4.0.0/lib/rails/commands/console.rb:9:in `start'
    from /Users/nils/.rvm/gems/ruby-1.9.3-p429/gems/railties-4.0.0/lib/rails/commands.rb:64:in `<top (required)>'
    from bin/rails:4:in `require'
    from bin/rails:4:in `<main>'

If I am only using one of the facets it works fine. My workaround is to just send two queries, but I am not sure if that's the right way to go here.

Any suggestions? I don't understand what's wrong here. Maybe I am just doing something wrong here,...

isabanin commented 11 years ago

I stumbled upon the same issue. Based on facet_filter's source code it will not work with multiple filters:

def facet_filter(type, *options)
  @value[:facet_filter] = Filter.new(type, *options).to_hash
  self
end
isabanin commented 11 years ago

Here's a workaround that I found:

status = Tire.search("comments") do
  facet "tags" do
    terms :tags
    filters = [{:term => {'account-id' => account.id}}]

    if project.present?
      filters << {:term => {'project-id' => project.id}}
    end

   facet_filter :and, filters
  end
end
fabien7337 commented 11 years ago

Do you finally succeed? Because I'm stuck in the same situation with multiple terms stats :(

karmi commented 11 years ago

Hi, there are several issues in the syntax. This works for me:

require 'tire'
require 'json'

Tire.index 'test-facet' do
  delete
  create
  store title: 'One', date: Time.now,     sentiment: 'negative', count: 1
  store title: 'Two', date: Time.now+30,  sentiment: 'negative', count: 2
  store title: 'Two', date: Time.now+10,  sentiment: 'positive', count: 3
  refresh
end

search = Tire.search 'test-facet' do
  facet 'histogram-negative' do
    date 'date', interval: '30s'
    facet_filter :term, sentiment: 'negative'
  end

  facet 'histogram-positive' do
    date 'date', interval: '30s'
    facet_filter :term, sentiment: 'positive'
  end

  facet 'stats-negative' do
    statistical 'count'
    facet_filter :term, sentiment: 'negative'
  end

  facet 'stats-positive' do
    statistical 'count'
    facet_filter :term, sentiment: 'positive'
  end
end

puts search.to_curl,

     "---",

     JSON.pretty_generate(search.results.facets)

Output:

{
  "histogram-negative": {
    "_type": "date_histogram",
    "entries": [
      {
        "time": 1379342580000,
        "count": 1
      },
      {
        "time": 1379342610000,
        "count": 1
      }
    ]
  },
  "histogram-positive": {
    "_type": "date_histogram",
    "entries": [
      {
        "time": 1379342580000,
        "count": 1
      }
    ]
  },
  "stats-negative": {
    "_type": "statistical",
    "count": 2,
    "total": 3.0,
    "min": 1.0,
    "max": 2.0,
    "mean": 1.5,
    "sum_of_squares": 5.0,
    "variance": 0.25,
    "std_deviation": 0.5
  },
  "stats-positive": {
    "_type": "statistical",
    "count": 1,
    "total": 3.0,
    "min": 3.0,
    "max": 3.0,
    "mean": 3.0,
    "sum_of_squares": 9.0,
    "variance": 0.0,
    "std_deviation": 0.0
  }
}
karmi commented 11 years ago

@isabanin That is indeed true. We should probably make the and automatic when the method is called multiple times.

fabien7337 commented 11 years ago

@karmi so if I need to do the same with TermStats to aggregate multi values on the same key, the best way to do that is:

require 'tire'
require 'json'

Tire.index 'test-facet' do
  delete
  create
  store title: 'One', channel_id: 'CHANNEL1', views_count: 12314, likes_count: 234
  store title: 'Two', channel_id: 'CHANNEL2', views_count: 92834, likes_count: 678
  store title: 'Three', channel_id: 'CHANNEL3', views_count: 213429, likes_count: 90
  refresh
end

search = Tire.search 'test-facet' do
  facet 'channels_views_count' do
    terms_stats :channel_id, :views_count
  end

  facet 'channels_likes_count' do
    terms_stats :channel_id, :likes_count
  end
end

puts search.to_curl,

     "---",

     JSON.pretty_generate(search.results.facets)

And I merge the 2 arrays after? No solution to do that with 1 facet?

karmi commented 11 years ago

@zywx AFAIK the terms_stats doesn't support multiple fields to aggregate on.

fabien7337 commented 11 years ago

Yes so the only solution is to do multiple facet, 1 per fied. Seems good for me.

I have an other question, how I can get on a terms stats the distinct count for example:

require 'tire'
require 'json'

Tire.index 'test-facet' do
  delete
  create
  store title: 'One', brand_id: 1, channel_id: 'CHANNEL1', views_count: 12314, likes_count: 234
  store title: 'Two', brand_id: 2, channel_id: 'CHANNEL2', views_count: 92834, likes_count: 678
  store title: 'Three', brand_id: 2, channel_id: 'CHANNEL3', views_count: 213429, likes_count: 90
  refresh
end

search = Tire.search 'test-facet' do
  facet 'brands_views_count' do
    terms_stats :brand_id, :views_count
  end

  facet 'brands_channels' do
    terms_stats :brand_id, :channel_id
  end
end

puts search.to_curl,

     "---",

     JSON.pretty_generate(search.results.facets)

Doesn't work because it seems terms stats doesn't support string for value_field... Any idea that could help me move forward ?

karmi commented 11 years ago

@zywx Best to open a separate issue, so we don't spam all people in this thread.

value_field must be numeric, since the stats are computed on it. Use terms for the channel_id.