Closed vsespb closed 11 years ago
I don't understand what you mean. In unit tests, you usually mock Tire integration, in integration tests, you let everything bubble up and down, calling Index#refresh
is enough. There are some pages on the Wiki.
I've read wiki. We are using index#refresh but seems that is not enough.
Our problem is:
We use tire in integration tests (we don't mock it) At least last half of year we have random test failures on different servers, on CI server and development machines. On some machines it's 100% failures for some tests, for other it's random. And we have more failures when running tests in parallel (parallel_test gem). (index names of course different for parallel processes)
There are two kind of failures: a) HTTP 500 with "no active shards" message b) Wrong items in search results. for example we expect one item, but there are two with same ID (and other fields).
Seems that Adding sleep(1) after tire.index.refresh helps, at least I did not ever see that it did not help. Also seems the following code helps (instead of sleep):
Tire::Configuration.client.get "#{Tire::Configuration.url}/_cluster/health?wait_for_status=yellow&timeout=1s&wait_for_active_shards=1"
I found some evidences that that health requests should be indeed used:
1) https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/wPKUJXiG2mw (it's the author of elasticsearch, who answered this ticket)
2) http://grokbase.com/t/gg/elasticsearch/12ag6njpyz/no-active-shards-in-tests-using-the-java-api
Your problem is then related to running the tests in parallel, possibly even running into issues with multiple elasticsearch nodes forming a cluster in the CI environment?
Normally, a refresh
is the only thing you need to do. Are you sure you're running a refresh in test setups? Does the Tire integration tests run fine for you? Also, it may help if you'd create the index with a single shard in test environment.
If nothing helps in your case, we may need to add an API for waiting for specific cluster health (color, number of active shards, etc).
Problem happens also when running single-process tests, but rarely. We are using refresh. Failures appear even if refresh used right before search.
Ok, I'll check Tire integration tests, and check if we have signle shard.
First, try running a single test repeatedly, if you also run into the issue.
If you parallelize tests, you have no isolation between test runs, ie. setup from second test can wipe data in elasticsearch for first test. You may mitigate the issue by using different index names per test, for instance when running the tests as part of integration test suite of a Rails application.
Yes, we definitely are using different index names per each concurrent process. Well, ok, thanks for the info. I will check all again. You can close the ticket probably, I will re-open and come with proof-of-concept minimal code in case nothing help.
Hm, seems latest Tire's integration test is failed on my box (one of test random failing)
1)
[REQUEST FAILED] curl -X GET 'http://localhost:9200/active_record_articles/active_record_article/_search?size=10&pretty' -d '{
"query": {
"query_string": {
"query": "[x"
}
},
"size": 10
}'
2)
[REQUEST FAILED] curl -X GET 'http://localhost:9200/index-aliased/_search?pretty' -d '{
"query": {
"match_all": {
}
}
}'
3)
ERROR (8:00:30.631) test: Percolator when percolating a document should return an array of matching query names.
undefined method `sort' for nil:NilClass
@ test/integration/percolator_test.rb:66:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
FAIL (8:00:35.473) test: Percolator when percolating a document should return an empty array when no query matches.
<[]> expected but was
<nil>.
@ test/integration/percolator_test.rb:61:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
4)
this test is random failing:
PASS (8:00:26.640) test: Percolator when percolating a document should return an array of matching query names for specific percolated queries.
sometimes
FAIL (10:00:55.617) test: Percolator when percolating a document should return an array of matching query names for specific percolated queries.
<["weather"]> expected but was
<nil>.
@ test/integration/percolator_test.rb:71:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/home/vse/.rvm/gems/ruby-1.9.3-p0@set1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
elasticsearch version: "version" : { "number" : "0.19.9", "snapshot_build" : false },
unit tests seems ok.
However, no, unit tests have random failures too:
Tire::IndexTest
PASS (0:00:00.013) test: Index analyze support should properly encode format parameter.
PASS (0:00:00.014) test: Index analyze support should properly encode parameters for analyzer.
PASS (0:00:00.014) test: Index analyze support should properly encode parameters for field.
PASS (0:00:00.014) test: Index analyze support should send text to the Analyze API.
PASS (0:00:00.014) test: Index mapping should create index with mapping.
PASS (0:00:00.015) test: Index mapping should return the mapping.
[ERROR] 400 > {"error":"Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@0"}, retrying (1)...
[ERROR] 400 > {"error":"Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@0"}, retrying (2)...
[ERROR] 400 > {"error":"Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@0"}, retrying (3)...
[ERROR] 400 > {"error":"Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@0"}, retrying (4)...
[ERROR] 400 > {"error":"Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@0"}, retrying (5)...
[ERROR] Too many exceptions occured, giving up. The HTTP response was: 400 > {"error":"Failed to derive xcontent from org.elasticsearch.common.bytes.BytesArray@0"}
400 : {"error":"MapperParsingException[mapping [model_with_incorrect_mapping]]; nested: MapperParsingException[No handler for type [boo] declared on field [title]]; ","status":400}
* DEFERRED: Finders should raise error when document is not found.
* DEFERRED: Persistent model attribute methods should allow to set deeply nested attributes on initialization.
* DEFERRED: Persistent model attribute methods should allow to set deeply nested attributes on update.
* DEFERRED: Model::Search should not define destroyed? if class already implements it.
Something must be weird in your environment... When I run:
for i in {1..15}; do ruby -I lib:test test/integration/percolator_test.rb; done
I get no "random errors"...
The [REQUEST FAILED] ... "query": "[x"
is a legitimate case -- a test for incorrect Lucene query.
The same with ModelWithIncorrectMapping
.
I added "sleep(1)" after
context "when percolating a document" do
setup do
@index.register_percolator_query('alert') { string 'warning' }
@index.register_percolator_query('gantz') { string '"y u no match"' }
@index.register_percolator_query('weather', :tags => ['weather']) { string 'severe' }
Tire.index('_percolator').refresh
sleep(1)
end
that fixed my tests!
PASS (0:00:04.309) test: Percolator when percolating a document should return an array of matching query names for specific percolated queries.
PASS (0:00:09.585) test: Percolator when percolating a document should return an array of matching query names.
PASS (0:00:14.885) test: Percolator when percolating a document should return an empty array when no query matches
That is absolutely same situation like I have with my Rails app (note - on several different servers).
I tried with ruby 1.9.3p0 and 1.9.3.p194, elasticsearch versions 0.19.3 and 0.19.9 (always using RVM).
Elasticsearch setup - not sure, probably default configuration.
Maybe you have idea what this can be related to?
If sleep(1)
solves it, either the refresh call is incorrect (done on a different index, etc), or there's something funky with the whole setup. Really, refresh
is the only thing you need to do in tests... Again, I've run the percolator integration tests 15 times without a single error.
In your app, you could have some concurrency stuf, etc., but Tire tests should be stable.
I'm afraid that without knowledge about the whole system I can't be of much help. Try building an isolated environment without access to outside world eg. with Vagrant and running the Tire test suite inside it.
So, I installed it into empty virtual machine, same failures. Same - fixed by sleep(1)
(Empty Ubuntu 12.04 LTS 64bit, Desktop edition. (in VirtualBox, host os is Ubuntu 10.04 64bit))
below is my install script + new failure log.
# under root
apt-get install -y build-essential git libyaml-dev libssl-dev libreadline5-dev zlib1g-dev libcurl4-openssl-dev libsqlite3-dev redis-server
cd /tmp
wget http://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p327.tar.gz && tar xvzf ruby-1.9.3-p327.tar.gz && cd ruby-1.9.3-p327
./configure && make -j 4 && make install
# make test is ok too
gem update --system
gem update
gem install bundle
aptitude install openjdk-6-jre
aptitude install elasticsearch
# under user
cd ~
mkdir tiretest && cd tiretest
git clone https://github.com/karmi/tire.git
# revision e76fcf214f1fed648155593920794a79daa3dce5)
cd tire
bundle install
bundle exec rake test:integration
Tire::IndexStoreIntegrationTest
PASS (1:00:56.344) test: Removing documents from the index should remove document from the index.
PASS (1:00:57.390) test: Retrieving documents from the index should retrieve document from the index.
FAIL (1:00:58.490) test: Retrieving documents from the index should return nil when retrieving missing document.
Expected <Item id: nil, _type: nil, _index: nil, _version: nil> to be nil.
@ test/integration/index_store_test.rb:105:in `block (2 levels) in <class:IndexStoreIntegrationTest>'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/usr/local/lib/ruby/gems/1.9.1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
Unit tests are without failures.
Note: I've used default elasticsearch and redis configs. Elasticsearch version is
{
"ok" : true,
"status" : 200,
"name" : "Beaubier, Jeanne-Marie",
"version" : {
"number" : "0.19.12",
"snapshot_build" : false
},
"tagline" : "You Know, for Search"
}
Java version:
OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Any ideas where to move next? Please advice.
I found similar ElasticSearch issue (however I can't reproduce it)
https://github.com/elasticsearch/elasticsearch/issues/698 ( https://gist.github.com/b5775ddf2160cf9c8dd7 )
(and maybe few others - not 100% sure that it's related) https://github.com/elasticsearch/elasticsearch/issues/233 https://github.com/elasticsearch/elasticsearch/issues/2316 https://github.com/elasticsearch/elasticsearch/issues/2267 https://github.com/elasticsearch/elasticsearch/issues/699
Any ideas where to move next? Please advice.
How many shards and replicas is the index created with when the tests fail?
(In my case, I have ES set up that number_of_shards:1
, number_of_replicas:0
by default -- but that may not be your case.)
Not sure how to get number of shards for that index? Do you mean need to add to Tire integration test code some request which get that number?
I have default config:
grep number_of_shards elasticsearch.yml
# index.number_of_shards: 5
# index.number_of_shards: 1
# The "number_of_shards" is a one-time setting for an index.
grep number_of_replicas elasticsearch.yml
# index.number_of_replicas: 1
# index.number_of_replicas: 0
# The "number_of_replicas" can be increased or decreased anytime,
(i.e. commented out)
I think that means 5 shards, 1 replicas
# Set the number of shards (splits) of an index (5 by default):
# Set the number of replicas (additional copies) of an index (1 by default):
Health requests returns:
curl http://localhost:9200/_cluster/health
{"cluster_name":"elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":6,"active_shards":6,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":5}
I added to Tire test code:
Tire.index('_percolator').refresh
puts `curl http://localhost:9200/_cluster/health`
(instead of sleep)
so, I have:
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":36,"active_shards":36,"relocating_shards":0,"initializing_shards":4,"unassigned_shards":41}
ERROR (0:00:02.671) test: Percolator when percolating a document should return an array of matching query names.
undefined method `sort' for nil:NilClass
@ test/integration/percolator_test.rb:69:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/usr/local/lib/ruby/gems/1.9.1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":36,"active_shards":36,"relocating_shards":0,"initializing_shards":4,"unassigned_shards":41}
FAIL (0:00:01.544) test: Percolator when percolating a document should return an array of matching query names for specific percolated queries.
<["weather"]> expected but was
<nil>.
@ test/integration/percolator_test.rb:74:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/usr/local/lib/ruby/gems/1.9.1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
i.e. status is "red"
Yes, by default your index is created with 5 shards and 1 replica. A very good default, but poor for test purposes. Uncomment the Note, that for development on a local machine ... line in elasticsearch.yml and see if it makes a difference.
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":36,"active_shards":36,"relocating_shards":0,"initializing_shards":4,"unassigned_shards":41}
Yeah, lots of unnasigned shards (replicas), some shards still initializing...
Hm, I uncommented
index.number_of_shards: 1
index.number_of_replicas: 0
stopped and started service. ps aux:
115 3904 3.3 2.2 1948036 90824 ? Sl 13:09 0:03 /usr/lib/jvm/java-6-openjdk-amd64//bin/java -Xms256m -Xmx1g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.pidfile=/var/run/elasticsearch.pid -Des.path.home=/usr/share/elasticsearch -cp :/usr/share/elasticsearch/lib/elasticsearch-0.19.12.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -Des.default.config=/etc/elasticsearch/elasticsearch.yml -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.work=/tmp/elasticsearch -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.ElasticSearch
(i.e. ran with /etc/elasticsearch/elasticsearch.yml)
grep number_of /etc/elasticsearch/elasticsearch.yml
# index.number_of_shards: 5
# index.number_of_replicas: 1
index.number_of_shards: 1
index.number_of_replicas: 0
# The "number_of_shards" is a one-time setting for an index.
# The "number_of_replicas" can be increased or decreased anytime,
however I see still more than one shard:
curl http://localhost:9200/_cluster/health
{"cluster_name":"elasticsearch","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":6,"active_shards":6,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":5}
first test started working, other two still failed:
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":12,"active_shards":12,"relocating_shards":0,"initializing_shards":1,"unassigned_shards":5}
ERROR (0:00:02.824) test: Percolator when percolating a document should return an array of matching query names.
undefined method `sort' for nil:NilClass
@ test/integration/percolator_test.rb:70:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/usr/local/lib/ruby/gems/1.9.1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":12,"active_shards":12,"relocating_shards":0,"initializing_shards":1,"unassigned_shards":5}
You still have unassigned and relocating shards -- drop all indices after restart (curl -X DELETE localhost:9200
).
Dropped indexes, restarted.
$ curl -X DELETE localhost:9200
{"ok":true,"acknowledged":true}
Health ok:
$ curl http://localhost:9200/_cluster/health
{"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0}
but tests not ok (and activeshards still big)
{"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":8,"active_shards":8,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0}
PASS (0:00:01.829) test: Percolator when percolating a document should return an array of matching query names for specific percolated queries.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 222 100 222 0 0 16081 0 --:--:-- --:--:-- --:--:-- 27750
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":7,"active_shards":7,"relocating_shards":0,"initializing_shards":1,"unassigned_shards":0}
ERROR (0:00:02.904) test: Percolator when percolating a document should return an array of matching query names.
undefined method `sort' for nil:NilClass
@ test/integration/percolator_test.rb:70:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/usr/local/lib/ruby/gems/1.9.1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 222 100 222 0 0 13607 0 --:--:-- --:--:-- --:--:-- 20181
{"cluster_name":"elasticsearch","status":"red","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":7,"active_shards":7,"relocating_shards":0,"initializing_shards":1,"unassigned_shards":0}
FAIL (0:00:04.018) test: Percolator when percolating a document should return an empty array when no query matches.
<[]> expected but was
<nil>.
@ test/integration/percolator_test.rb:65:in `block (3 levels) in <class:PercolatorIntegrationTest>'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/usr/local/lib/ruby/gems/1.9.1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
Then you have to wait until the cluster is ready in your test setup, as you initially mentioned:
Tire::Configuration.client.get "#{Tire::Configuration.url}/_cluster/health?wait_for_status=yellow"
It helps much, but after already I started that ticket, we found that it's actually not helping in 100% cases. Now I reproduced it: When I run test suite I am getting far less errors: but here is one
{"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":8,"active_shards":8,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0}
FAIL (0:00:37.403) test: Bulk should extract the routing value from documents.
<1> expected but was
<2>.
@ test/integration/bulk_test.rb:34:in `block (2 levels) in <class:BulkIntegrationTest>'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `call'
/usr/local/lib/ruby/gems/1.9.1/gems/shoulda-context-1.0.1/lib/shoulda/context/context.rb:398:in `block in create_test_from_should_hash'
/usr/local/lib/ruby/gems/1.9.1/gems/mocha-0.10.5/lib/mocha/integration/mini_test/version_230_to_262.rb:28:in `run'
PASS (0:00:37.983) test: Bulk should store a collection of documents and refresh the index.
PASS (0:00:41.498) test: Bulk should timeout when consistency factor is not met.
(status is green) bulk_test is modified
should "extract the routing value from documents" do
@index.bulk_store [ { id: '1', title: 'A', _routing: 'a'}, { id: '2', title: 'B', _routing: 'b'} ]
@index.refresh
puts `curl http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=1s`
should "extract the routing value from documents" do
Runs fine for me... Really don't know what's up with your environment/setup...
OK, I understand that, several developers tried to fixed, but no results. I'll try investigate more, but few questions:
1) do you see evidences that there is there something wrong with my elasticsearch setup ? For example number of shards in health request does not match config etc ?
2) Are you the maintainer of Elasticsearch? It might be related to elasticsearch, not Tire. Should I create ticket on elasticsearch issues page?
3) Could you please give some information about you environment (which linux distro?, is that UTC timezone?, full elasticsearch config for test, amd64 vs x86?, java version, SSD vs HDD) ?
I control now at least 3 machines where this problem appears (AFAIK there are even more). On two of them I tested Tire integration tests and it fail. One of them is empty, new virtual machine. Let me describe what is different and what is common in those machines' setup:
Different: OS: Ubuntu 12.04, Ubuntu 10.04 JAVA: Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode), OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) ELASTICSEARCH: 0.19.3, 0.19.9, 0.19.12 REDIS 2.4.14, 2.2.12 RUBY 1.9.3p0, 1.9.3p194, 1.9.3p327 Linux kernel 3.2.0-24-generic 3.2.0-32-generic 2.6.38-16-generic CPU i7-3930K i7-2600 Memory 4, 16 or 64 GB Locale LANG EN, RU Timezone: MSK, CET RVM: both: RVM and not RVM
Common: ELASTISEARCH CONFIG: default CPU: Sandy Bridge Architecture: x64 Redis config: default Linux distro: Ubuntu Install method: Debian package Timezone: Not UTC HDD: HDD, NOT SDD Ruby 1.9.3.x
Actually now Elasticsearch config is empty (all lines are comments), execept those two:
index.number_of_shards: 1
index.number_of_replicas: 0
I am going to install elasticsearch without .deb package, maybe it help (maybe there are some default values compiled in for debian package)
Also let me describe the nature of failure frequencies:
1) if some test start failing, it will be failing in 99% or 100% cases until we "fix" it 2) that same test does not necessary fail on another machine 3) test start failing when we change our app/test source code 4) most interesting: to fix test we change Rails code in testsuite, which is not related to the failing part (for example we add more logging code or we change code in different test or we move test to another place of file, or we swap some lines)
Seems issue
FAIL (0:00:03.555) test: Bulk should extract the routing value from documents.
<1> expected but was
<2>.
is not related to the bug I am talking about. I think it's different bug. it only happens when config consists of two those lines:
index.number_of_shards: 1
index.number_of_replicas: 0
Steps to reproduce:
1) empty the config file, put only two lines (above) into it 2) run curl -X DELETE localhost:9200 3) restart ElasticSearch 4) run
for i in {1..15}; do ruby -I lib:test test/integration/bulk_test.rb; done
for revision e76fcf214f1fed648155593920794a79daa3dce5 of Tire ElasticSearch 0.19.12 (tested on two different machines) (note: sleep(1), wait for status are not helping here)
Can you confirm this?
Hi,
I've spent a great deal of time with this yesterday evening. My conclusions:
should extract the routing value from documents
test. In an Ubuntu 12.04 virtual machine, this test fails all the time.The single routing test does fail, not others. Sometimes there are failures, but the test suite is quite heavy, a 1GB VM is only capable to withstand a certain level of load. In my last runs, the only failing test was the routing one.
I have found a bug in the test: the test is run against an index with only one shard, thus the "filtering" based on routing value can't work. Why it works on a Mac, I have no idea.
I'm opening a new issue just for this, so we can move forward.
@vsespb See issue karmi/tire#540
At least we have a repeatable technique to run tests in an isolated environment via Vagrant. If you want to try it out:
Download or clone this gist:
git clone git://gist.github.com/4239703.git elasticsearch_vagrant && cd elasticsearch_vagrant
Run bundle install, etc. Then, configure ES:
echo '
{
"elasticsearch" : {
"cluster_name" : "elasticsearch_test_in_vagrant",
"index_shards" : 1,
"index_replicas" : 0
}
}' > node.json
Install Ruby 1.9.3, Tire, etc:
echo '
# Install the Tire Rubygem
apt-get install make git libcurl3 libcurl3-gnutls libcurl4-openssl-dev libsqlite3-dev -y
apt-get install redis-server -y
apt-get install ruby1.9.3 -y
update-alternatives --set ruby /usr/bin/ruby1.9.1
gem install bundler --no-rdoc --no-ri
test -d "tire" || git clone git://github.com/karmi/tire.git
chown -R vagrant:staff tire
cd tire
bundle install
' > install.sh
Build and provision the machine:
NODE_CONFIG=node.json INSTALL=install.sh time bundle exec vagrant up
Run the Tire tests:
echo '
cd tire
echo "Running Tire tests in '`pwd`'"
echo "Ruby: `ruby -v`"
echo "Java: `java -version 2>&1 | head -1`"
for i in {1..5}
do
echo "-------------------------------------------------------------------------------"
echo "TEST $i"
echo "-------------------------------------------------------------------------------"
time bundle exec rake test
if [ $? == 0 ]; then echo "$i: OK" >> results.txt; else echo "$i: FAIL">> results.txt; fi
done
echo; echo; echo "==============================================================================="
cat results.txt
' > run_tire_tests.sh
cat run_tire_tests.sh | xargs -0 bundle exec vagrant ssh -c
Thank you for your hard work!
So, my recent testing show that there are 3 ways to fix random failing tests:
1) Add wait for green/yellow status call after index refresh call.
2) Set
index.number_of_shards: 1
index.number_of_replicas: 0
it helps. now when bulk_test fixed I don't see failing test anymore
3) Optimize machine performance.
All our machines are powerfull. Sandy Bridge i7 (4 or 6 cores) with 16Gb or 64 Gb RAM and 4 GB VM(host is 16Gb) But moving elasticsearch DATADIR to ramdrive (/run/shm etc) actually fixed the tests on two machines. However, for example if I decrease number of CPU for VM from 8 to 1 tests start failing again.
Seems (1) works almost always. But I saw once when it didn't help. (3) is making things much better, but not always helping. It's very sensitive to overall load. And seems (2) works always.
That my testing only related to Tire test suite. I am not sure about our application. Only waiting for failures during a work week can show if we fixed it (strange but new test failures only appear when we change something in source code, testing 100500 times same code unlikely produce new fails).
I already use method (1) for our tests. Going to apply both (2) and (3). I will report here if we see that it helped.
Thanks for Vagrant manual, I will use it next time.
p.s. "should extract the routing value from documents" test was failing only with config
index.number_of_shards: 1
index.number_of_replicas: 0
it was fine on default (empty) config. that can explain why it works on your Mac.
it was fine on default (empty) config. that can explain why it works on your Mac.
Not really :) I have the exact same setting as a default on my machine.
Add wait for green/yellow status call after index refresh call.
That should really be more then enough for any test to pass. If the application tests fail, there must be some issues with concurrency, parallelism, threads, etc. I understand a brittle test suite is a really bad thing -- that's why I wanted to get to the bottom of this issue..
FYI
I had one test which actually generates failures in ElasticSearch (mentioned in ticket #543). I found that this test outdated and is not needed anymore, so I removed it.
After that seems that we don't have problems with elasticsearch race conditions.
I currently only run wait for yellow/green status after each reindex. (I did not change shared in config, and did not optimize performance/move elastic search to RAM drive on CI server). I didn't try removing wait for yellow/green however.
So I have a feeling that shared failures can cause race conditions in other requests to same elasticsearch server.
Yes, there is quite certainly potential for some race conditions / timing issues. Note the karmi/tire#541, which was failing for me consistently.
yep, i watching 541.
Hello. Thanks for fixing PercolatorIntegrationTest But sad that other test race conditions not reproduced/not investigated.
We've also had these same issues.. used sleep.. wasn't aware of #refresh (will look for documentation on this), and will try setting the shards and other suggested options.. Thanks for this thread.
Do I understand correctly that it's impossible to use tire in unit tests? Before each search request we need to wait for cluster yellow/green status?
http://grokbase.com/t/gg/elasticsearch/12ag6njpyz/no-active-shards-in-tests-using-the-java-api
http://www.elasticsearch.org/guide/reference/api/admin-cluster-health.html
https://github.com/elasticsearch/elasticsearch/blob/a0d1b9b695beef84179e5b7391eaae3abffc3702/src/main/java/org/elasticsearch/rest/action/admin/cluster/health/RestClusterHealthAction.java#L56
https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/wPKUJXiG2mw
Is this known issue?