Test examples pollute each other

KrishnaKulkarni commented 8 years ago

I'm using Chewy to provide an elasticsearch search platform for entities in my app called resource_sites/local_sites (they have many standard attributes like name, description,etc. along with addresses).

Our Chewy::Index analyzes various text fields for matches, as shown here:

class ResourceSiteIndex < Chewy::Index
  define_type ResourceSite.with_hours.includes(:resource_programs,
                                               :address, :organization,
                                               :eligibilities, :languages,
                                               resource_programs: :translations) do
    field :name, type: :string, index: 'analyzed', analyzer: 'english'
    field :id, type: :long
    field :description, index: 'analyzed', analyzer: 'english'
    field :resource_site_type, index: 'not_analyzed', value: -> { type }
    field :updated_at, type: :date
    field :available, type: :boolean, value: -> { !hidden }
    field :ada_compliant, type: :boolean

    field :cost_of_service_minimum, type: :integer
    field :cost_of_service_maximum, type: :integer

    field :program_ids, type: :long,
                        value: -> { resource_programs.pluck(:id) }

    field :program_names, index: 'analyzed', analyzer: 'english',
                          value: -> { resource_programs.pluck(:name) }

    field :eligibility_ids, type: :long,
                            value: -> { eligibilities.pluck(:id) }

    field :eligibility_names, index: 'analyzed', analyzer: 'english',
                              value: -> { eligibilities.pluck(:name) }

    field :organization_name, index: 'analyzed', analyzer: 'english',
                              value: ->(site) do
                                site.organization.name if site.organization
                              end

    field :location, type: :geo_point, value: ->(site) do
      if site.type == 'LocalSite' && site.address?
        {
          lat: site.address.latitude,
          lon: site.address.longitude
        }
      end
    end

    field :regions, index: 'not_analyzed', value: ->(site) do
      site.regions if site.type == 'RegionalSite'
    end
  end
end

When a user enters text into query text input and then clicks 'Search', it searches for matches among various text fields. However, we wanted to tweak the relevancy scoring such that if we found a text query match in the resource_site.name, that result would be ranked much higher than other results that found a textual match in a different field.

We attempted to achieve that with boost factors like so (note: we're interested to know a better way of implementing this, but that's a question for a separate issue):

class ResourceSiteSearch

## A bunch of other methods here

  def index
    ResourceSiteIndex
  end

# queried_text is the string that comes in from the search query box
  def boost_name_relevancy
    if queried_text.present?
      terms_to_match = queried_text.downcase.split(' ')
      index.
        boost_factor(ENV.fetch('NAME_BOOST', 4).to_i,
                     filter: { terms: { name: terms_to_match } }).
        boost_factor(ENV.fetch('NAME_BASE', 2).to_i,
                     filter: { not: { terms: { name: terms_to_match } } })
    end
  end
end

Our big problem comes when we attempt to test this behavior of altered relevancy scores. Specifically, it appears that there is pollution occuring between the examples. Here is an illustrative case:

# spec/searches/test_spec.rb
require 'spec_helper'

## Rspec config for :elasticsearch config
# config.around :example do |example|
#   if example.metadata[:elasticsearch]
#     ElasticsearchTestCluster.start

#     Chewy.request_strategy = :atomic
#     Chewy.strategy(:atomic) do
#       example.run
#     end
#   end
# end

# config.after(:suite) do
#   ElasticsearchTestCluster.stop
# end

describe ResourceSiteSearch, :elasticsearch do
  before { ResourceSiteIndex.purge! }

  describe 'relevancy scoring' do
    it 'is a dummy example that pollutes index of the other example' do
      7.times { create(:local_site) }
    end

    it 'should prioritize search results that match name' do
      food_in_description_site_1 = create(
        :local_site,
        description: 'One Food Provider'
      )
      #1.times { create(:local_site) } # Uncommenting this oddly makes the example pass.
      food_in_name_site = create(
        :local_site,
        name: 'Best Food Provider'
      )
      food_in_description_site_2 = create(
        :local_site,
        description: 'Two Food Provider'
      )
      ResourceSiteIndex::ResourceSite.import!(
        food_in_description_site_1,
        food_in_name_site,
        food_in_description_site_2
      )

      search_results = ResourceSiteSearch.new(
        {search: 'Food Provider'},
        {current_user: build(:admin)},
        {}
      ).local_search

      p map_relevancy_scores(search_results)
      expect(search_results.to_a.first).to eq(food_in_name_site)
    end
  end

  def map_relevancy_scores(search_results)
    search_results.tap(&:to_a).instance_variable_get('@_results').
      map { |r| [r.id, r._score] }
  end
end

Running the latter example in isolation with rspec spec/searches/test_spec.rb:27 passes. However, running both examples in sequence with rspec spec/searches/test_spec.rb --order defined causes the latter example to fail and to display different relevancies than it does when being run in isolation.

Given that we execute ResourceSiteIndex.purge! before every example, why are we seeing this pollution?

dleve123 commented 8 years ago

cc/ @pyromaniac – it would be great if we could get some support on this issue. If anything here is unclear, please let me know!

(I'm a colleague of @KrishnaKulkarni)

pyromaniac commented 8 years ago

Oh, sorry, I've missed this issue somehow. At first, try to use this syntax: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#CO116-1 for fields boosting, it looks much better ;) Also why are you using terms instead of queries?

The second thing here: it is not the best idea to start and stop test cluster on the fly, it will slow down specs running. Better to start test cluster manually before the whole suite once.

And as for the main question - I'm surprised as well. ResourceSiteIndex.purge! should cleanup everything there. The only thing I'm concerned right now is those atomic strategy wrapping. It is not the most convenient idea. I usually wrapping everything with bypass strategy and doing import manually wherever I need. Try to use this approach, it could help.

dleve123 commented 8 years ago

@pyromaniac Thanks a ton for your feedback!

At first, try to use this syntax: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#CO116-1 for fields boosting, it looks much better ;) Also why are you using terms instead of queries?

Nice pointer – thanks!

The second thing here: it is not the best idea to start and stop test cluster on the fly, it will slow down specs running. Better to start test cluster manually before the whole suite once.

We usually only start and stop the cluster once, but were going for a maximal level of isolation while debugging.

ResourceSiteIndex.purge! should cleanup everything there. The only thing I'm concerned right now is those atomic strategy wrapping. It is not the most convenient idea. I usually wrapping everything with bypass strategy and doing import manually wherever I need. Try to use this approach, it could help.

We'll follow this approach and let you know what happens!

dleve123 commented 8 years ago

Hey @pyromaniac. I've been working on this issue today. I haven't made too much progress. I thought I would share some more information.

Debugging Approach

Update the Chewy gem (bundle update chewy)
Use the bypass strategy (Chewy.strategy(:bypass) in the before block in spec_helper.rb)
Set the 'test' prefix in spec_helper (in attempts to implement https://github.com/toptal/chewy#client-settings)
Isolate the Chewy logs from the Rails logs to see if I could glean any more information

Interesting Points

Commenting in just the site = create(:local_site) (and leaving ResourceSiteIndex::ResourceSite.import!(site) commented out), causes rspec spec/searches/test_spec.rb --order defined to pass.
- Relevancy scores: [["Best Food Provider", 3.3410292], ["beer store", 1.6705146], ["wine store", 0.26632088]]
Relevancy scores of unmodified test_spec.rb: [["beer store", 0.5904797], ["wine store", 0.5904797], ["Best Food Provider", 0.53264177]]. All of these scores are much closer and lower in nominal value.

I have a hunch that before { ResourceSiteIndex.purge! } isn't operating as expected. There's no log output for the purge – is there any way to check that that method call is working as expected?

Thanks a ton!

Relevant Code

Latest `spec/search/test_spec.rb`

require 'spec_helper'

describe ResourceSiteSearch, :elasticsearch do
  before { ResourceSiteIndex.purge! }

  describe 'relevancy scoring' do
    it 'is a dummy example that pollutes index of the other example' do
      Chewy.logger.info 'START OF FIRST EXAMPLE GROUP'

      7.times {
        site = create(:local_site)
        ResourceSiteIndex::ResourceSite.import!(site)
      }

      Chewy.logger.info 'END OF FIRST EXAMPLE GROUP'
    end

    it 'should prioritize search results that match name' do
      Chewy.logger.info 'START OF SECOND EXAMPLE GROUP'

      food_in_description_site_1 = create(
        :local_site,
        name: 'beer store',
        description: 'One Food Provider'
      )

      ## commenting this in makes spec pass
      # site = create(:local_site)
      # ResourceSiteIndex::ResourceSite.import!(site)

      food_in_name_site = create(
        :local_site,
        name: 'Best Food Provider'
      )
      food_in_description_site_2 = create(
        :local_site,
        name: 'wine store',
        description: 'Two Food Provider'
      )
      ResourceSiteIndex::ResourceSite.import!(
        food_in_description_site_1,
        food_in_name_site,
        food_in_description_site_2
      )

      search_results = ResourceSiteSearch.new(
        {search: 'Food Provider'},
        {current_user: build(:admin)},
        {}
      ).local_search

      p map_relevancy_scores(search_results)
      expect(search_results.to_a.first).to eq(food_in_name_site)
      Chewy.logger.info 'END OF SECOND EXAMPLE GROUP'
    end
  end

  def map_relevancy_scores(search_results)
    search_results.tap(&:to_a).instance_variable_get('@_results').
      map { |r| [r.id, r._score] }
  end
end

`spec/spec_helper.rb`

# This is currently broken.
# Looks like a potential fix is described here:
# https://github.com/codeclimate/ruby-test-reporter#using-with-parallel_tests
if ENV['CODECLIMATE_REPO_TOKEN']
  require 'codeclimate-test-reporter'
  CodeClimate::TestReporter.start
end

ENV['RAILS_ENV'] ||= 'test'

require 'rubygems'

require File.expand_path('../../config/environment', __FILE__)

require 'rspec/rails'
require 'rspec/its'

require 'capybara/rspec'
require 'database_cleaner'

require 'shoulda-matchers'

require 'chewy/rspec'

require 'draper/test/rspec_integration'
require 'fakeredis/rspec'
require 'webmock/rspec'

require 'devise'

require 'factory_girl'

require 'json_matchers/rspec'
JsonMatchers.schema_root = 'spec/support/schemas/api'

# Requires supporting ruby files with custom matchers and macros, etc,
# in spec/support/ and its subdirectories.
Dir[Rails.root.join('spec/support/**/*.rb')].each { |f| require f }

WebMock.disable_net_connect!(allow_localhost: true)

RSpec.configure do |config|
  config.include Devise::TestHelpers, type: :controller
  config.include AbstractController::Translation
  config.include FactoryGirl::Syntax::Methods
  config.include InstrumentationSupport
  config.include HelperSupport, type: :helper

  feature_only_support_files = [
    SignInSupport,
    FlashSupport,
    AdminItemsSupport,
    CapybaraSupport,
    WaitForAjax,
    MailerSupport
  ]
  feature_only_support_files.each do |support_module|
    config.include support_module, type: :feature
  end

  config.use_transactional_fixtures = false

  config.mock_with :rspec do |c|
    c.verify_partial_doubles = true
  end

  config.expect_with :rspec do |c|
    c.syntax = :expect
  end

  [:feature, :controller].each do |type|
    config.before(type: type) do
      stub_intercom_events
      stub_intercom_messages
      stub_google_maps
    end
  end

  config.before(:suite) do |config|
    Chewy.strategy(:bypass)
    Chewy.logger = Logger.new('log/chewy_debug.log')
  end

  config.after(:suite) do
    ElasticsearchTestCluster.stop
  end

  config.before(:each, :elasticsearch) do |config|
    ElasticsearchTestCluster.start
  end

  # If true, the base class of anonymous controllers will be inferred
  # automatically. This will be the default behavior in future versions of
  # rspec-rails.
  config.infer_base_class_for_anonymous_controllers = false
  config.infer_spec_type_from_file_location!

  # Run specs in random order to surface order dependencies. If you find an
  # order dependency and want to debug it, you can fix the order by providing
  # the seed, which is printed after each run.
  #     --seed 1234
  config.order = 'random'
end

`spec/support/services/elasticsearch_support.rb`

require 'elasticsearch/extensions/test/cluster/tasks'

module ElasticsearchTestCluster
  # TODO: I think naming this `ensure_started` might be better
  def self.start
    unless running?
      Elasticsearch::Extensions::Test::Cluster.start(nodes: 1)
    end
  end

  def self.stop
    if running?
      Elasticsearch::Extensions::Test::Cluster.stop
    end
  end

  def self.running?
    Elasticsearch::Extensions::Test::Cluster.running?
  end
end

`Gemfile.lock`

https://gist.github.com/dleve123/588e7cf21cbeca0280c2

pyromaniac commented 8 years ago

Did you try to get ResourceSiteIndex::ResourceSite.all.to_a and at least count it? Is there really objects from the first example during the second one?

pyromaniac commented 8 years ago

Try to setup Chewy.transport_logger and Chewy.transport_tracer to get the list of ES requests

dleve123 commented 8 years ago

@pyromaniac Thanks for working with me here.

By inserting puts ResourceSiteIndex::ResourceSite.all.count and puts ResourceSiteIndex::ResourceSite.all.to_a.size into the specs at the beginning and end of each spec, I have been able to confirm that the documents are being deleted from the index.

Moreover, the following lines appear in the Chewy.transport_logger logs.

I, [2015-12-17T16:09:34.481973 #79370]  INFO -- : DELETE http://localhost:9250/test_resource_site [status:404, request:0.013s, query:N/A]
D, [2015-12-17T16:09:34.482123 #79370] DEBUG -- : < {"error":"IndexMissingException[[test_resource_site] missing]","status":404}
F, [2015-12-17T16:09:34.482204 #79370] FATAL -- : [404] {"error":"IndexMissingException[[test_resource_site] missing]","status":404}
# index delete request at the very beginning of the first spec, so it doesn't exist yet (I think)
....
I, [2015-12-17T16:09:36.611760 #79370]  INFO -- : DELETE http://localhost:9250/test_resource_site [status:200, request:0.049s, query:n/a]
D, [2015-12-17T16:09:36.611931 #79370] DEBUG -- : < {"acknowledged":true}
# successful deletion request at the beginning of the 2nd and last spec

Is there some some other Elasticsearch memory that could be persisting between index deletions? Maybe something due to the fact that the indices that are being created and destroy could have the same name or something like that?

pyromaniac commented 8 years ago

I don't think index manipulations could affect score. Are you still using those filters? Or which request are you using now. Have no idea how the score could be affected though. Which ES version are you using now?

davebream commented 8 years ago

@KrishnaKulkarni @dleve123 Any news on this? I've got a similar problem :)

dleve123 commented 8 years ago

@davebream We've moved along from this issue a while ago – I don't recall the resolution of this. Perhaps @KrishnaKulkarni remembers what went on here, but I'm not certain.

toptal / chewy