Closed mch2 closed 9 months ago
@sachinpkale FYI
Taking a look
Fix is merged to main: https://github.com/opensearch-project/OpenSearch/pull/5789 Need to backport to 2.x and 2.5
The fix is merged and backported.
Heads up I ran into a different error when running :server:test
locally before opening #8826. It does not repro on it's own. Maybe someone knows the cause of this and whether we should re-open this issue:
./gradlew ':server:test' --tests "org.opensearch.index.translog.RemoteFSTranslogTests.testConcurrentWriteViewsAndSnapshot" -Dtests.seed=11209ED8456F2E6A -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sr-ME -Dtests.timezone=Pacific/Johnston -Druntime.java=20
2> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=3157, name=writer_0, state=RUNNABLE, group=TGRP-RemoteFSTranslogTests]
at __randomizedtesting.SeedInfo.seed([F8C495824172C1FF:DC1D6CAF9AE5327C]:0)
Caused by:
java.lang.AssertionError: [index][1] Expected non-empty readers
at __randomizedtesting.SeedInfo.seed([F8C495824172C1FF]:0)
at org.opensearch.index.translog.RemoteFsTranslog.deleteStaleRemotePrimaryTerms(RemoteFsTranslog.java:430)
at org.opensearch.index.translog.RemoteFsTranslog.trimUnreferencedReaders(RemoteFsTranslog.java:400)
at org.opensearch.index.translog.RemoteFSTranslogTests$2.doRun(RemoteFSTranslogTests.java:821)
1> [2023-07-21T22:25:10,111][INFO ][o.o.i.t.RemoteFSTranslogTests] [testReadLocation] before test
1> [2023-07-21T22:25:10,132][INFO ][o.o.i.t.RemoteFSTranslogTests] [testReadLocation] after test
1> [2023-07-21T22:25:10,139][INFO ][o.o.i.t.RemoteFSTranslogTests] [testUploadWithPrimaryModeTrue] before test
1> [2023-07-21T22:25:10,155][INFO ][o.o.i.t.RemoteFSTranslogTests] [testUploadWithPrimaryModeTrue] after test
1> [2023-07-21T22:25:10,162][INFO ][o.o.i.t.RemoteFSTranslogTests] [testTranslogWriterFsyncDisabledInRemoteFsTranslog] before test
1> [2023-07-21T22:25:10,190][INFO ][o.o.i.t.RemoteFSTranslogTests] [testTranslogWriterFsyncDisabledInRemoteFsTranslog] after test
1> [2023-07-21T22:25:10,197][INFO ][o.o.i.t.RemoteFSTranslogTests] [testConcurrentWritesWithVaryingSize] before test
1> [2023-07-21T22:25:10,207][INFO ][o.o.i.t.RemoteFSTranslogTests] [testConcurrentWritesWithVaryingSize] testing with [7] threads, each doing [14] ops
1> [2023-07-21T22:25:10,440][INFO ][o.o.i.t.RemoteFSTranslogTests] [testConcurrentWritesWithVaryingSize] after test
1> [2023-07-21T22:25:10,451][INFO ][o.o.i.t.RemoteFSTranslogTests] [testTranslogWriterCanFlushInAddOrReadCall] before test
1> [2023-07-21T22:25:10,476][INFO ][o.o.i.t.RemoteFSTranslogTests] [testTranslogWriterCanFlushInAddOrReadCall] after test
1> [2023-07-21T22:25:10,482][INFO ][o.o.i.t.RemoteFSTranslogTests] [testRangeSnapshot] before test
1> [2023-07-21T22:25:10,532][INFO ][o.o.i.t.RemoteFSTranslogTests] [testRangeSnapshot] after test
1> [2023-07-21T22:25:10,538][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperationsUpload] before test
1> [2023-07-21T22:25:10,561][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperationsUpload] All md files [9223372035702745179__9223372036854775805__9223370346876465252__1, 9223372035702745179__9223372036854775804__9223370346876465248__1]
1> [2023-07-21T22:25:10,563][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperationsUpload] All data files [translog-3.ckp, translog-1.tlog, translog-2.ckp, translog-3.tlog, translog-1.ckp, translog-2.tlog]
1> [2023-07-21T22:25:10,563][ERROR][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperationsUpload] Asserting content of 2
1> [2023-07-21T22:25:10,564][ERROR][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperationsUpload] Asserting content of 3
1> [2023-07-21T22:25:10,567][INFO ][o.o.i.t.t.TranslogTransferManager] [testSimpleOperationsUpload] [index][1] Deleting primary terms from remote store lesser than 1152030628
1> [2023-07-21T22:25:10,578][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperationsUpload] after test
1> [2023-07-21T22:25:10,590][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSyncUpToStream] before test
1> [2023-07-21T22:25:10,669][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSyncUpToStream] after test
1> [2023-07-21T22:25:10,675][INFO ][o.o.i.t.RemoteFSTranslogTests] [testCloseIntoReader] before test
1> [2023-07-21T22:25:10,697][INFO ][o.o.i.t.RemoteFSTranslogTests] [testCloseIntoReader] after test
1> [2023-07-21T22:25:10,708][INFO ][o.o.i.t.RemoteFSTranslogTests] [testMetadataFileDeletion] before test
1> [2023-07-21T22:25:10,745][INFO ][o.o.i.t.t.TranslogTransferManager] [testMetadataFileDeletion] [index][1] Deleting primary terms from remote store lesser than 1979622222
1> [2023-07-21T22:25:10,770][INFO ][o.o.i.t.RemoteFSTranslogTests] [testMetadataFileDeletion] numDocs=7 moreDocs=4
1> [2023-07-21T22:25:10,813][INFO ][o.o.i.t.t.TranslogTransferManager] [testMetadataFileDeletion] [index][1] Downloading translog files with: Primary Term = 1979622222, Generation = 13, Location = /opt/dev/opensearch-project/opensearch/.worktrees/enhance/mediaTypeParserRegistry/server/build/testrun/test/temp/org.opensearch.index.translog.RemoteFSTranslogTests_F8C495824172C1FF-001/tempDir-044
1> [2023-07-21T22:25:10,814][INFO ][o.o.i.t.t.TranslogTransferManager] [testMetadataFileDeletion] [index][1] Downloading translog files with: Primary Term = 1979622222, Generation = 12, Location = /opt/dev/opensearch-project/opensearch/.worktrees/enhance/mediaTypeParserRegistry/server/build/testrun/test/temp/org.opensearch.index.translog.RemoteFSTranslogTests_F8C495824172C1FF-001/tempDir-044
1> [2023-07-21T22:25:10,831][INFO ][o.o.i.t.t.TranslogTransferManager] [testMetadataFileDeletion] [index][1] Deleting primary terms from remote store lesser than 1979622223
1> [2023-07-21T22:25:10,836][INFO ][o.o.i.t.t.TranslogTransferManager] [org.opensearch.index.translog.RemoteFSTranslogTests] [index][1] Deleted primary term 1979622222
1> [2023-07-21T22:25:10,863][INFO ][o.o.i.t.RemoteFSTranslogTests] [testMetadataFileDeletion] after test
1> [2023-07-21T22:25:10,873][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperations] before test
1> [2023-07-21T22:25:10,898][INFO ][o.o.i.t.RemoteFSTranslogTests] [testSimpleOperations] after test
2> NOTE: test params are: codec=Asserting(Lucene95): {}, docValues:{}, maxPointsInLeafNode=948, maxMBSortInHeap=5.221747612462641, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=he-IL, timezone=America/Danmarkshavn
2> NOTE: Linux 5.17.0-1033-oem amd64/Eclipse Adoptium 20.0.1 (64-bit)/cpus=24,threads=1,free=424241208,total=536870912
2> NOTE: All tests run in this JVM: [DynamicActionRegistryTests, AddVotingConfigExclusionsRequestTests, DecommissionResponseTests, CancelTasksRequestTests, ClusterGetSettingsResponseTests, SnapshotIndexShardStatusTests, MappingVisitorTests, GetAliasesResponseTests, CreateIndexResponseTests, GetIndexActionTests, ResolveIndexResponseTests, UpdateSettingsRequestSerializationTests, GetIndexTemplatesResponseTests, BulkRequestModifierTests, DeleteResponseTests, TransportMultiGetActionTests, SimulateProcessorResultTests, CreatePitControllerTests, SearchPhaseExecutionExceptionTests, TransportMultiSearchActionTests, RetryableActionTests, TransportWriteActionForIndexingPressureTests, JavaVersionTests, NodeClientHeadersTests, ShardFailedClusterStateTaskExecutorTests, ClusterBootstrapServiceRenamedSettingTests, LeaderCheckerTests, DecommissionControllerTests, ComponentTemplateTests, IndexAbstractionTests, MetadataIndexStateServiceTests, DiscoveryNodeTests, PrimaryTermsTests, AllocationConstraintsTests, DecisionsImpactOnClusterHealthTests, MaxRetryAllocationDeciderTests, RemoteShardsMoveShardsTests, TenShardsOneReplicaRoutingTests, RestoreInProgressAllocationDeciderTests, TaskBatcherTests, RoundingTests, CompositeBytesReferenceTests, AutoCloseableRefCountedTests, GeometryIndexerTests, PointBuilderTests, HeaderWarningTests, MinScoreScorerTests, NetworkUtilsTests, MemorySizeSettingsTests, JavaDateMathParserTests, ByteUtilsTests, ReorganizingLongHashTests, FutureUtilsTests, SizeBlockingQueueTests, JsonVsCborTests, JacksonLocationTests, SettingsBasedSeedHostsProviderTests, ExtensionActionUtilTests, RegisterCustomSettingsTests, PriorityComparatorTests, IndexingPressureServiceTests, ShardIndexingPressureTests, PreConfiguredTokenFilterTests, EngineConfigFactoryTests, RecoverySourcePruneMergePolicyTests, NoOrdinalsStringFieldDataTests, BinaryFieldMapperTests, DocCountFieldMapperTests, FieldAliasMapperValidationTests, GeoShapeFieldTypeTests, KeywordFieldTypeTests, NumberFieldTypeTests, SourceFieldMapperTests, CombineIntervalsSourceProviderTests, GeoBoundingBoxQueryBuilderTests, MatchNoneQueryBuilderTests, QueryStringQueryBuilderTests, SpanFirstQueryBuilderTests, WildcardQueryBuilderTests, DeleteByQueryRequestTests, MultiMatchQueryTests, GlobalCheckpointSyncActionTests, RetentionLeasesTests, PrimaryReplicaSyncerTests, ShardUtilsTests, RemoteBufferedOutputDirectoryTests, FileCacheCleanerTests, RemoteFSTranslogTests]
Hit this again with - https://github.com/opensearch-project/OpenSearch/pull/9743#issuecomment-1730186485
Taking a look
Ran on local env 25K+ times without any failures. Closing.
Caught this seed while running local checks against 2.5 branch. Seed fails 100% of the time for me.
Trace: