Closed KrishnaKulkarni closed 7 years ago
cc/ @pyromaniac – it would be great if we could get some support on this issue. If anything here is unclear, please let me know!
(I'm a colleague of @KrishnaKulkarni)
Oh, sorry, I've missed this issue somehow. At first, try to use this syntax: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#CO116-1 for fields boosting, it looks much better ;) Also why are you using terms instead of queries?
The second thing here: it is not the best idea to start and stop test cluster on the fly, it will slow down specs running. Better to start test cluster manually before the whole suite once.
And as for the main question - I'm surprised as well. ResourceSiteIndex.purge! should cleanup everything there. The only thing I'm concerned right now is those atomic strategy wrapping. It is not the most convenient idea. I usually wrapping everything with bypass strategy and doing import manually wherever I need. Try to use this approach, it could help.
@pyromaniac Thanks a ton for your feedback!
At first, try to use this syntax: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#CO116-1 for fields boosting, it looks much better ;) Also why are you using terms instead of queries?
Nice pointer – thanks!
The second thing here: it is not the best idea to start and stop test cluster on the fly, it will slow down specs running. Better to start test cluster manually before the whole suite once.
We usually only start and stop the cluster once, but were going for a maximal level of isolation while debugging.
ResourceSiteIndex.purge! should cleanup everything there. The only thing I'm concerned right now is those atomic strategy wrapping. It is not the most convenient idea. I usually wrapping everything with bypass strategy and doing import manually wherever I need. Try to use this approach, it could help.
We'll follow this approach and let you know what happens!
Hey @pyromaniac. I've been working on this issue today. I haven't made too much progress. I thought I would share some more information.
bundle update chewy
)Chewy.strategy(:bypass)
in the before
block in spec_helper.rb
)spec_helper
(in attempts to implement https://github.com/toptal/chewy#client-settings)site = create(:local_site)
(and leaving ResourceSiteIndex::ResourceSite.import!(site)
commented out), causes rspec spec/searches/test_spec.rb --order defined
to pass.
[["Best Food Provider", 3.3410292], ["beer store", 1.6705146], ["wine store", 0.26632088]]
test_spec.rb
: [["beer store", 0.5904797], ["wine store", 0.5904797], ["Best Food Provider", 0.53264177]]
. All of these scores are much closer and lower in nominal value.I have a hunch that before { ResourceSiteIndex.purge! }
isn't operating as expected. There's no log output for the purge – is there any way to check that that method call is working as expected?
Thanks a ton!
spec/search/test_spec.rb
require 'spec_helper'
describe ResourceSiteSearch, :elasticsearch do
before { ResourceSiteIndex.purge! }
describe 'relevancy scoring' do
it 'is a dummy example that pollutes index of the other example' do
Chewy.logger.info 'START OF FIRST EXAMPLE GROUP'
7.times {
site = create(:local_site)
ResourceSiteIndex::ResourceSite.import!(site)
}
Chewy.logger.info 'END OF FIRST EXAMPLE GROUP'
end
it 'should prioritize search results that match name' do
Chewy.logger.info 'START OF SECOND EXAMPLE GROUP'
food_in_description_site_1 = create(
:local_site,
name: 'beer store',
description: 'One Food Provider'
)
## commenting this in makes spec pass
# site = create(:local_site)
# ResourceSiteIndex::ResourceSite.import!(site)
food_in_name_site = create(
:local_site,
name: 'Best Food Provider'
)
food_in_description_site_2 = create(
:local_site,
name: 'wine store',
description: 'Two Food Provider'
)
ResourceSiteIndex::ResourceSite.import!(
food_in_description_site_1,
food_in_name_site,
food_in_description_site_2
)
search_results = ResourceSiteSearch.new(
{search: 'Food Provider'},
{current_user: build(:admin)},
{}
).local_search
p map_relevancy_scores(search_results)
expect(search_results.to_a.first).to eq(food_in_name_site)
Chewy.logger.info 'END OF SECOND EXAMPLE GROUP'
end
end
def map_relevancy_scores(search_results)
search_results.tap(&:to_a).instance_variable_get('@_results').
map { |r| [r.id, r._score] }
end
end
spec/spec_helper.rb
# This is currently broken.
# Looks like a potential fix is described here:
# https://github.com/codeclimate/ruby-test-reporter#using-with-parallel_tests
if ENV['CODECLIMATE_REPO_TOKEN']
require 'codeclimate-test-reporter'
CodeClimate::TestReporter.start
end
ENV['RAILS_ENV'] ||= 'test'
require 'rubygems'
require File.expand_path('../../config/environment', __FILE__)
require 'rspec/rails'
require 'rspec/its'
require 'capybara/rspec'
require 'database_cleaner'
require 'shoulda-matchers'
require 'chewy/rspec'
require 'draper/test/rspec_integration'
require 'fakeredis/rspec'
require 'webmock/rspec'
require 'devise'
require 'factory_girl'
require 'json_matchers/rspec'
JsonMatchers.schema_root = 'spec/support/schemas/api'
# Requires supporting ruby files with custom matchers and macros, etc,
# in spec/support/ and its subdirectories.
Dir[Rails.root.join('spec/support/**/*.rb')].each { |f| require f }
WebMock.disable_net_connect!(allow_localhost: true)
RSpec.configure do |config|
config.include Devise::TestHelpers, type: :controller
config.include AbstractController::Translation
config.include FactoryGirl::Syntax::Methods
config.include InstrumentationSupport
config.include HelperSupport, type: :helper
feature_only_support_files = [
SignInSupport,
FlashSupport,
AdminItemsSupport,
CapybaraSupport,
WaitForAjax,
MailerSupport
]
feature_only_support_files.each do |support_module|
config.include support_module, type: :feature
end
config.use_transactional_fixtures = false
config.mock_with :rspec do |c|
c.verify_partial_doubles = true
end
config.expect_with :rspec do |c|
c.syntax = :expect
end
[:feature, :controller].each do |type|
config.before(type: type) do
stub_intercom_events
stub_intercom_messages
stub_google_maps
end
end
config.before(:suite) do |config|
Chewy.strategy(:bypass)
Chewy.logger = Logger.new('log/chewy_debug.log')
end
config.after(:suite) do
ElasticsearchTestCluster.stop
end
config.before(:each, :elasticsearch) do |config|
ElasticsearchTestCluster.start
end
# If true, the base class of anonymous controllers will be inferred
# automatically. This will be the default behavior in future versions of
# rspec-rails.
config.infer_base_class_for_anonymous_controllers = false
config.infer_spec_type_from_file_location!
# Run specs in random order to surface order dependencies. If you find an
# order dependency and want to debug it, you can fix the order by providing
# the seed, which is printed after each run.
# --seed 1234
config.order = 'random'
end
spec/support/services/elasticsearch_support.rb
require 'elasticsearch/extensions/test/cluster/tasks'
module ElasticsearchTestCluster
# TODO: I think naming this `ensure_started` might be better
def self.start
unless running?
Elasticsearch::Extensions::Test::Cluster.start(nodes: 1)
end
end
def self.stop
if running?
Elasticsearch::Extensions::Test::Cluster.stop
end
end
def self.running?
Elasticsearch::Extensions::Test::Cluster.running?
end
end
Gemfile.lock
Did you try to get ResourceSiteIndex::ResourceSite.all.to_a
and at least count it? Is there really objects from the first example during the second one?
Try to setup Chewy.transport_logger and Chewy.transport_tracer to get the list of ES requests
@pyromaniac Thanks for working with me here.
By inserting puts ResourceSiteIndex::ResourceSite.all.count
and puts ResourceSiteIndex::ResourceSite.all.to_a.size
into the specs at the beginning and end of each spec, I have been able to confirm that the documents are being deleted from the index.
Moreover, the following lines appear in the Chewy.transport_logger
logs.
I, [2015-12-17T16:09:34.481973 #79370] INFO -- : DELETE http://localhost:9250/test_resource_site [status:404, request:0.013s, query:N/A]
D, [2015-12-17T16:09:34.482123 #79370] DEBUG -- : < {"error":"IndexMissingException[[test_resource_site] missing]","status":404}
F, [2015-12-17T16:09:34.482204 #79370] FATAL -- : [404] {"error":"IndexMissingException[[test_resource_site] missing]","status":404}
# index delete request at the very beginning of the first spec, so it doesn't exist yet (I think)
....
I, [2015-12-17T16:09:36.611760 #79370] INFO -- : DELETE http://localhost:9250/test_resource_site [status:200, request:0.049s, query:n/a]
D, [2015-12-17T16:09:36.611931 #79370] DEBUG -- : < {"acknowledged":true}
# successful deletion request at the beginning of the 2nd and last spec
Is there some some other Elasticsearch memory that could be persisting between index deletions? Maybe something due to the fact that the indices that are being created and destroy could have the same name or something like that?
I don't think index manipulations could affect score. Are you still using those filters? Or which request are you using now. Have no idea how the score could be affected though. Which ES version are you using now?
@KrishnaKulkarni @dleve123 Any news on this? I've got a similar problem :)
@davebream We've moved along from this issue a while ago – I don't recall the resolution of this. Perhaps @KrishnaKulkarni remembers what went on here, but I'm not certain.
I'm using Chewy to provide an elasticsearch search platform for entities in my app called
resource_sites
/local_sites
(they have many standard attributes likename
,description
,etc. along withaddress
es).Our
Chewy::Index
analyzes various text fields for matches, as shown here:When a user enters text into query text input and then clicks 'Search', it searches for matches among various text fields. However, we wanted to tweak the relevancy scoring such that if we found a text query match in the
resource_site.name
, that result would be ranked much higher than other results that found a textual match in a different field.We attempted to achieve that with boost factors like so (note: we're interested to know a better way of implementing this, but that's a question for a separate issue):
Our big problem comes when we attempt to test this behavior of altered relevancy scores. Specifically, it appears that there is pollution occuring between the examples. Here is an illustrative case:
Running the latter example in isolation with
rspec spec/searches/test_spec.rb:27
passes. However, running both examples in sequence withrspec spec/searches/test_spec.rb --order defined
causes the latter example to fail and to display different relevancies than it does when being run in isolation.Given that we execute
ResourceSiteIndex.purge!
before every example, why are we seeing this pollution?