Open reta opened 2 years ago
https://github.com/opensearch-project/OpenSearch/pull/4632/files -- this has the fix and it was merged already. I was not able to reproduce the issue with 500 iterationss in local.
Post 2.x I don't see this issue being tagged as well. We can close the issue if this issue is not seen in the latest builds.
NOTE: The fix in #4632 was for the initial report in #4089, not for this issue. This bug still exists and is referenced in comments starting here
Copying a summary of comment over from #4089 where several repro logs and other comments were made. See that issue for background.
Have spent several hours digging into this issue.
For the create failure; it's obviously a race condition and close to a Heisenbug. While regularly reproducible at least once in a run of size 500, adding logging extended successful runs with a failure on run 2463.
Commonalities when it fails:
Tracing through the debug logs I added, I'm at the "how did this ever work" phase of debugging. The boolean allowPartialSearchResults
is never referred to in the create workflow called as part of this. In particular, the PIT creation failure occurs in this block of code:
https://github.com/opensearch-project/OpenSearch/blob/ea1cc9d054b2d7cf0046438ecc57777e29112c2b/server/src/main/java/org/opensearch/action/search/CreatePitController.java#L123-L153
The getter searchRequest.allowPartialSearchResults()
is never referred to here; call hierarchy indicates it's called on multi phase requests at the beginning of AbstractSearchAsyncAction.executeNextPhase()
but there is no reference to it in SinglePhaseSearchAction
which is called here. The SearchRequest
variable request
isn't even passed to the constructor, just the search task.
Looks like this code was implemented in #3921. @bharath-techie do you have any observations/comments that can further help debug this?
Digging into this, the original commit #3291 did not cause this failure; the unneeded test was added by #4030. I'll back out the line causing the failures.
java.lang.AssertionError: expected:<0> but was:<1>
at __randomizedtesting.SeedInfo.seed([561A7A0B4C9B2562:CD8D356D0C98A1D3]:0)
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at org.opensearch.search.pit.PitMultiNodeIT.validatePitStats(PitMultiNodeIT.java:363)
at org.opensearch.search.pit.PitMultiNodeIT$1.onNodeStopped(PitMultiNodeIT.java:118)
at org.opensearch.test.InternalTestCluster$NodeAndClient.closeForRestart(InternalTestCluster.java:1072)
at org.opensearch.test.InternalTestCluster.restartNode(InternalTestCluster.java:2032)
at org.opensearch.test.InternalTestCluster.restartRandomDataNode(InternalTestCluster.java:1980)
at org.opensearch.search.pit.PitMultiNodeIT.testCreatePitWhileNodeDropWithAllowPartialCreationFalse(PitMultiNodeIT.java:112)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.base/java.lang.Thread.run(Thread.java:1583)
The issue is back:
Describe the bug Another flaky test, introduced by https://github.com/opensearch-project/OpenSearch/pull/3921/files
@Bukhtawar could you please take a look?
To Reproduce
Expected behavior Test must always pass
Plugins Standard
Screenshots
Host/Environment (please complete the following information):
Additional context See please https://build.ci.opensearch.org/job/gradle-check/1902/testReport/junit/org.opensearch.search/PitMultiNodeTests/testCreatePitWhileNodeDropWithAllowPartialCreationFalse/