opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.88k stars 1.84k forks source link

[BUG] org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot is flaky #9115

Open ashking94 opened 1 year ago

ashking94 commented 1 year ago

Describe the bug org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot test is flaky on main branch. I ran the test on loop and it failed on the 15th iteration itself.

To Reproduce The same seed is not always reproducing the failure. To reproduce, kindly run the test on loop and wait for the test to fail.

Expected behavior The test should pass.

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Jenkins build failure link - https://build.ci.opensearch.org/job/gradle-check/21871/

ashking94 commented 1 year ago

@kasundra07 @harishbhakuni21 fyi

sachinpkale commented 1 year ago

Not able to reproduce failure in local even after 1000 attempts. Closing

sohami commented 1 year ago

Reopening this as again seeing this test failing:

Ref CI: https://build.ci.opensearch.org/job/gradle-check/25984/

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=5A77171FC14EEBF7 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=vi -Dtests.timezone=PRC -Druntime.java=20
java.lang.AssertionError: 
Expected: is <7>
     but: was <4>
    at __randomizedtesting.SeedInfo.seed([5A77171FC14EEBF7:5F34FA1617FAB2A9]:0)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
    at org.junit.Assert.assertThat(Assert.java:964)
    at org.junit.Assert.assertThat(Assert.java:930)
    at org.opensearch.snapshots.AbstractSnapshotIntegTestCase.createFullSnapshot(AbstractSnapshotIntegTestCase.java:489)
    at org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot(DeleteSnapshotIT.java:85)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
    at java.base/java.lang.reflect.Method.invoke(Method.java:578)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
    at java.base/java.lang.Thread.run(Thread.java:1623)
sohami commented 1 year ago

@harishbhakuni Can you take a look at this ?

andrross commented 1 year ago

I can get this to fail every time with the following seed:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=4CD3155D4F1C1A9F
java.lang.AssertionError: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
    at __randomizedtesting.SeedInfo.seed([4CD3155D4F1C1A9F]:0)
    at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1627)
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
    at org.opensearch.index.store.lockmanager.FileLockInfo$LockFileUtils.getAcquirerIdFromLock(FileLockInfo.java:103)
    at org.opensearch.index.store.lockmanager.FileLockInfo.lambda$getLockForAcquirer$0(FileLockInfo.java:59)
    at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
    at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    at org.opensearch.index.store.lockmanager.FileLockInfo.getLockForAcquirer(FileLockInfo.java:60)
    at org.opensearch.index.store.lockmanager.RemoteStoreMetadataLockManager.release(RemoteStoreMetadataLockManager.java:65)
    at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1590)
    ... 7 more
harishbhakuni commented 1 year ago

> metadata9223372036854775806922337203685477580392233720368547757909223372036854775800Hf3Dbw2QQagfGLlVBOUrg92233703403988650711ZxZ4Wh89SXyEPmSYAHrIrQ.lock

This issue is fixed with this PR: https://github.com/opensearch-project/OpenSearch/issues/10217

I can get this to fail every time with the following seed:

./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=4CD3155D4F1C1A9F
java.lang.AssertionError: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
  at __randomizedtesting.SeedInfo.seed([4CD3155D4F1C1A9F]:0)
  at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1627)
  at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
  at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
  at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908)
  at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Provided Lock Name metadata__9223372036854775806__9223372036854775803__9223372036854775790__9223372036854775800___Hf3Dbw2QQagfGLlVBOUrg__9223370340398865071__1___ZxZ4Wh89SXyEPmSYAHrIrQ.lock is not Valid.
  at org.opensearch.index.store.lockmanager.FileLockInfo$LockFileUtils.getAcquirerIdFromLock(FileLockInfo.java:103)
  at org.opensearch.index.store.lockmanager.FileLockInfo.lambda$getLockForAcquirer$0(FileLockInfo.java:59)
  at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
  at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
  at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
  at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
  at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
  at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
  at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
  at org.opensearch.index.store.lockmanager.FileLockInfo.getLockForAcquirer(FileLockInfo.java:60)
  at org.opensearch.index.store.lockmanager.RemoteStoreMetadataLockManager.release(RemoteStoreMetadataLockManager.java:65)
  at org.opensearch.repositories.blobstore.BlobStoreRepository.lambda$executeOneStaleIndexDelete$37(BlobStoreRepository.java:1590)
  ... 7 more

This one i didn't see before.. some uuid generation issue looks like. let me check this one.

peternied commented 7 months ago

[Triage - attendees 1 2 3 4 5 6 7] Looks like this still might be an issue, reopening so it is investigated

peternied commented 7 months ago

From the other issue: https://build.ci.opensearch.org/job/gradle-check/37349/testReport/

java.lang.AssertionError: 
Expected: is <9>
     but: was <8>
    at __randomizedtesting.SeedInfo.seed([30BCA240DC8694B0:35FF4F490A32CDEE]:0)
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
    at org.junit.Assert.assertThat(Assert.java:964)
    at org.junit.Assert.assertThat(Assert.java:930)
    at org.opensearch.snapshots.AbstractSnapshotIntegTestCase.createFullSnapshot(AbstractSnapshotIntegTestCase.java:497)
    at org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot(DeleteSnapshotIT.java:92)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DeleteSnapshotIT.testDeleteShallowCopySnapshot" -Dtests.seed=30BCA240DC8694B0 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=fr-CA -Dtests.timezone=Antarctica/Vostok -Druntime.java=21