mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Test2BPostingsBytes org.apache.lucene.index.CorruptIndexException: docs out of order (490879719 <= 490879719 ) [LUCENE-8925] #922

Open mikemccand opened 5 years ago

mikemccand commented 5 years ago

8x branch at commit 081e2ef2c05e017e87a2aef2a4f55067fbba5cb4

while running ant -Dtests.filter=(@monsteror@slow) and not(@awaitsfix) -Dtests.heapsize=4G -Dtests.jvms=64 test

  2> NOTE: reproduce with: ant test  -Dtestcase=Test2BPostingsBytes -Dtests.method=test -Dtests.seed=1C14F78FC0AF1835 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=fr 
-Dtests.timezone=SystemV/AST4ADT -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[23:54:00.627] ERROR    111s J52 | Test2BPostingsBytes.test <<<
   > Throwable #1: org.apache.lucene.index.CorruptIndexException: docs out of order (490879719 <= 490879719 ) (resource=MockIndexOutputWrapper(FSIndexOutput(path="/home/danielgb
/lucene-solr/lucene/build/core/test/J52/temp/lucene.index.Test2BPostingsBytes_1C14F78FC0AF1835-001/2BPostingsBytes3-001/_0_Lucene50_0.doc")))
   >    at __randomizedtesting.SeedInfo.seed([1C14F78FC0AF1835:9440C8556E5375CD]:0)
   >    at org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.startDoc(Lucene50PostingsWriter.java:236)
   >    at org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:148)
   >    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:865)
   >    at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:344)
   >    at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
   >    at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:169)
   >    at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:245)
   >    at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
   >    at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2988)
   >    at org.apache.lucene.util.TestUtil.addIndexesSlowly(TestUtil.java:990)
   >    at org.apache.lucene.index.Test2BPostingsBytes.test(Test2BPostingsBytes.java:127)
   >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
   >    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
   >    at java.lang.reflect.Method.invoke(Method.java:508)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
   >    at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
   >    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
   >    at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
   >    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
   >    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
   >    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
   >    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
   >    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
   >    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
   >    at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
   >    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >    at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
   >    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >    at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   >    at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
   >    at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
   >    at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
   >    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
   >    at java.lang.Thread.run(Thread.java:818)
  2> NOTE: leaving temporary files on disk at: /home/danielgb/lucene-solr/lucene/build/core/test/J52/temp/lucene.index.Test2BPostingsBytes_1C14F78FC0AF1835-001
  2> NOTE: test params are: codec=Lucene80, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@c792b533), locale=fr, timezone=SystemV/AST4ADT
  2> NOTE: Linux 3.10.0-957.21.3.el7.ppc64le ppc64le/IBM Corporation 1.8.0_211 (64-bit)/cpus=64,threads=1,free=88221008,total=422117376
  2> NOTE: All tests run in this JVM: [TestTopDocsCollector, Test2BPostingsBytes]

Legacy Jira details

LUCENE-8925 by Daniel Black on Jul 18 2019, updated Jul 19 2019 Environment:

RHEL-7.3 (ppc64le - Power9)

kernel 3.10.0-957.21.3.el7.ppc64le

48G vm, 64 core

java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 8.0.5.37 - pxl6480sr5fp37-20190618_01(SR5 FP37))
IBM J9 VM (build 2.9, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20190617_419755 (JIT enabled, AOT enabled)
OpenJ9 - 354b31d
OMR - 0437c69
IBM - 4972efe)
JCL - 20190606_01 based on Oracle jdk8u211-b25
mikemccand commented 5 years ago

Test2BPostingsBytes is a Lucene test, not a Solr test. Which means this report is out of place in the SOLR project on Jira. The test mentioned in SOLR-13639 is also a Lucene test, not a Solr test. I will move both issues to the LUCENE project.

Java from IBM is known to have bugs when running Lucene. IBM enables several optimizations by default, some of which are not compatible with Lucene code. Using OpenJDK or a JDK from Oracle will likely produce better results.

[Legacy Jira: Shawn Heisey (@elyograg) on Jul 18 2019]

mikemccand commented 5 years ago

I tested the repro line (after adding -Dtests.monster=true) at the tip of branch_8x (commit eb75a60857deb96c55a2d79cdb4cdabf4a0fda1b) with openjdk version "1.8.0_171":

$ ant test  -Dtestcase=Test2BPostingsBytes -Dtests.method=test -Dtests.seed=1C14F78FC0AF1835 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=fr -Dtests.timezone=SystemV/AST4ADT -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.monster=true
[...]
   [junit4] Suite: org.apache.lucene.index.Test2BPostingsBytes
   [junit4] HEARTBEAT J0 PID(16237@localhost): 2019-07-18T11:29:13, stalled for 71.2s at: Test2BPostingsBytes.test
   [junit4] HEARTBEAT J0 PID(16237@localhost): 2019-07-18T11:30:13, stalled for  131s at: Test2BPostingsBytes.test
   [junit4] HEARTBEAT J0 PID(16237@localhost): 2019-07-18T11:31:13, stalled for  191s at: Test2BPostingsBytes.test
   [junit4] OK       222s | Test2BPostingsBytes.test
   [junit4] Completed [1/1] in 222.41s, 1 test
   [junit4] 
   [junit4] JVM J0:     0.77 ..   223.76 =   222.99s
   [junit4] Execution time total: 3 minutes 43 seconds
   [junit4] Tests summary: 1 suite, 1 test
[...]
BUILD SUCCESSFUL
Total time: 3 minutes 46 seconds

I also tested at the same commit as the OP (081e2ef2c05e017e87a2aef2a4f55067fbba5cb4), with the same result: BUILD SUCCESSFUL. (The first time I tried this resulted in OOM, but succeeded on the second try, after adding -Dtests.heapsize=30g to the cmdline, which may be overkill here, but is what I use on my Jenkins jobs that run the monster tests once a week.)

[Legacy Jira: Steven Rowe on Jul 18 2019]

mikemccand commented 5 years ago

Thanks for the tips and the retest. Passed this test twice in the RHEL openjdk 1.8.0_212-b04. I don't think I always saw this error but will continue to test more fully on OpenJDK.

Cheers

[Legacy Jira: Daniel Black on Jul 19 2019]