opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.81k stars 1.83k forks source link

[BUG] org.opensearch.remotestore.multipart.RemoteStoreMultipartIT is flaky #11997

Open reta opened 10 months ago

reta commented 10 months ago

Describe the bug

The test case org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testNoSearchIdleForAnyReplicaCount is flaky:

org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testNoSearchIdleForAnyReplicaCount

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=12912, name=Thread-7889, state=RUNNABLE, group=TGRP-RemoteStoreMultipartIT]
Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
    at __randomizedtesting.SeedInfo.seed([7E1BDEE57B443ADB]:0)
    at app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50)
    at app//org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:175)
    at app//org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
    at app//org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
    at app//org.opensearch.index.store.RemoteDirectory.lambda$uploadBlob$1(RemoteDirectory.java:362)
    at app//org.opensearch.common.blobstore.transfer.RemoteTransferContainer.lambda$getMultipartStreamSupplier$1(RemoteTransferContainer.java:167)
    at app//org.opensearch.common.blobstore.transfer.RemoteTransferContainer.lambda$getTransferPartStreamSupplier$0(RemoteTransferContainer.java:145)
    at app//org.opensearch.common.StreamContext.provideStream(StreamContext.java:69)
    at app//org.opensearch.remotestore.multipart.mocks.MockFsAsyncBlobContainer.lambda$asyncBlobUpload$0(MockFsAsyncBlobContainer.java:60)
    at java.base@21.0.1/java.lang.Thread.run(Thread.java:1583)

Related component

Other

To Reproduce

./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testNoSearchIdleForAnyReplicaCount" -Dtests.seed=7E1BDEE57B443ADB

Expected behavior

The test must always pass

Additional Details

Plugins Standard

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context

peternied commented 9 months ago

[Triage - attendees 1 2 3] @reta Thanks for filing

andrross commented 9 months ago
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testOverriddenBufferInterval" -Dtests.seed=56DCBC16E3B2A344 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=uk -Dtests.timezone=Asia/Baghdad -Druntime.java=21

org.opensearch.remotestore.multipart.RemoteStoreMultipartIT > testOverriddenBufferInterval FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=10413, name=Thread-7230, state=RUNNABLE, group=TGRP-RemoteStoreMultipartIT]

        Caused by:
        org.apache.lucene.store.AlreadyClosedException: this Directory is closed
            at __randomizedtesting.SeedInfo.seed([56DCBC16E3B2A344]:0)
            at app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50)
            at app//org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:175)
            at app//org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
            at app//org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
            at app//org.opensearch.index.store.RemoteDirectory.lambda$uploadBlob$1(RemoteDirectory.java:362)
            at app//org.opensearch.common.blobstore.transfer.RemoteTransferContainer.lambda$getMultipartStreamSupplier$1(RemoteTransferContainer.java:167)
            at app//org.opensearch.common.blobstore.transfer.RemoteTransferContainer.lambda$getTransferPartStreamSupplier$0(RemoteTransferContainer.java:145)
            at app//org.opensearch.common.StreamContext.provideStream(StreamContext.java:69)
            at app//org.opensearch.remotestore.multipart.mocks.MockFsAsyncBlobContainer.lambda$asyncBlobUpload$0(MockFsAsyncBlobContainer.java:60)
            at java.****@21.0.2/java.lang.Thread.run(Thread.java:1583)

https://build.ci.opensearch.org/job/gradle-check/32595/consoleText

linuxpi commented 6 months ago

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 11 12 13]

@vikasvb90 Updating the release target to 2.15. Feel free to get in touch with folks if you need any help

sohami commented 4 months ago

Another test is flaky in the same class:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.multipart.RemoteStoreMultipartIT.testAsyncDurabilityThrowsExceptionWhenRestrictSettingTrue" -Dtests.seed=5083B88EA28AD82D -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=el-POLYTON -Dtests.timezone=America/Adak -Druntime.java=21

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=4818, name=Thread-1060, state=RUNNABLE, group=TGRP-RemoteStoreMultipartIT]
Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is closed
    at __randomizedtesting.SeedInfo.seed([5083B88EA28AD82D]:0)
    at app//org.apache.lucene.store.BaseDirectory.ensureOpen(BaseDirectory.java:50)
    at app//org.opensearch.index.store.FsDirectoryFactory$HybridDirectory.openInput(FsDirectoryFactory.java:175)
    at app//org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
    at app//org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:101)
    at app//org.opensearch.index.store.RemoteDirectory.lambda$uploadBlob$2(RemoteDirectory.java:377)
    at app//org.opensearch.common.blobstore.transfer.RemoteTransferContainer.lambda$getMultipartStreamSupplier$1(RemoteTransferContainer.java:207)
    at app//org.opensearch.common.blobstore.transfer.RemoteTransferContainer.lambda$getTransferPartStreamSupplier$0(RemoteTransferContainer.java:185)
    at app//org.opensearch.common.StreamContext.provideStream(StreamContext.java:69)
    at app//org.opensearch.remotestore.multipart.mocks.MockFsAsyncBlobContainer.lambda$asyncBlobUpload$0(MockFsAsyncBlobContainer.java:60)
    at java.base@21.0.3/java.lang.Thread.run(Thread.java:1583)

CI: https://build.ci.opensearch.org/job/gradle-check/41593/testReport/junit/org.opensearch.remotestore.multipart/RemoteStoreMultipartIT/testAsyncDurabilityThrowsExceptionWhenRestrictSettingTrue/