opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[BUG] Fix a flaky test in `KNN990CodecTests::testBuildFromModelTemplate`. It is failing to get a mocking value. #2257

Open 0ctopus13prime opened 2 weeks ago

0ctopus13prime commented 2 weeks ago

What is the bug? KNN990CodecTests::testBuildFromModelTemplate does a static mocking to return a mocked OpenSearchKNNModelDao where it suppose to return a predefined value ModelMetadata, but the actual value returned is either null or its state does not equal to ModelState.CREATED.

        try (MockedStatic<ModelDao.OpenSearchKNNModelDao> modelDaoMockedStatic = Mockito.mockStatic(ModelDao.OpenSearchKNNModelDao.class)) {
            ModelDao.OpenSearchKNNModelDao modelDao = mock(ModelDao.OpenSearchKNNModelDao.class);
            modelDaoMockedStatic.when(ModelDao.OpenSearchKNNModelDao::getInstance).thenReturn(modelDao);
...

            ModelMetadata modelMetadata1 = new ModelMetadata(
                knnEngine,
                spaceType,
                dimension,
                ModelState.CREATED, <------------- It's state is CREATED

...

            Model mockModel = new Model(modelMetadata1, modelBlob, modelId);
            when(modelDao.get(modelId)).thenReturn(mockModel);
            when(modelDao.getMetadata(modelId)).thenReturn(modelMetadata1); <------ It should return the mocked state.

However, it keeps getting false from isModelCreated(modelMetadata).

  1> java.lang.IllegalArgumentException: Model ID 'test-model' is not created.
  1>    at org.opensearch.knn.indices.ModelUtil.getModelMetadata(ModelUtil.java:54)
  1>    at org.opensearch.knn.common.FieldInfoExtractor.extractKNNEngine(FieldInfoExtractor.java:42)
  1>    at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeKNNEngine(KNNCodecUtil.java:126)
  1>    at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeEngineFileFromFieldInfo(KNNCodecUtil.java:106)

How can one reproduce the bug? It's tricky to reproduce it, sometimes when we're lucky we can see the failure, but mostly it will pass. Very subtle.

What is the expected behavior? We should be able to get modelMetadata1 when modelDao.getMetadata("test-model") was called. But somehow an invalid value is being return. There are two possible cases.

  1. null : false will be returned as a result of evaluation of modelMetadata != null
  2. It's state was not ModelState.CREATED. False == return modelMetadata.getState().equals(ModelState.CREATED).
    public static boolean isModelPresent(ModelMetadata modelMetadata) {
        return modelMetadata != null;
    }

    public static boolean isModelCreated(ModelMetadata modelMetadata) {
        if (!isModelPresent(modelMetadata)) {
            return false;
        }
        return modelMetadata.getState().equals(ModelState.CREATED);
    }

What is your host/environment?

Do you have any screenshots? Error logs:

  2> java.lang.RuntimeException: CheckIndex failed
        at __randomizedtesting.SeedInfo.seed([C26A40CD175601BB:4B1A74C7DA1BEBB9]:0)
        at org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:350)
        at org.apache.lucene.tests.util.TestUtil.checkIndex(TestUtil.java:316)
        at org.apache.lucene.tests.store.BaseDirectoryWrapper.close(BaseDirectoryWrapper.java:44)
        at org.opensearch.knn.index.codec.KNNCodecTestCase.testBuildFromModelTemplate(KNNCodecTestCase.java:311)
        at org.opensearch.knn.index.codec.KNN990Codec.KNN990CodecTests.testBuildFromModelTemplate(KNN990CodecTests.java:28)
  2> NOTE: leaving temporary files on disk at: D:\a\k-NN\k-NN\build\testrun\test\temp\org.opensearch.knn.index.codec.KNN990Codec.KNN990CodecTests_C26A40CD175601BB-001
  2> NOTE: test params are: codec=Asserting(Lucene912): {}, docValues:{}, maxPointsInLeafNode=431, maxMBSortInHeap=7.043039621004881, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=be, timezone=America/Merida
  2> NOTE: Windows Server 2022 10.0 amd64/Azul Systems, Inc. 21.0.5 (64-bit)/cpus=4,threads=1,free=219468992,total=536870912
  2> NOTE: All tests run in this JVM: [FieldInfoExtractorTests, KNNVectorUtilTests, OutOfNativeMemoryExceptionTests, KNNCreateIndexFromModelTests, KNNSettingsTests, KNNVectorIndexFieldDataTests, SpaceTypeTests, VectorQueryTypeTests, KNNCodecServiceTests, KNN80CompoundFormatTests, KNN80DocValuesProducerTests, KNN990CodecTests]
  1> [2024-11-05T14:33:11,291][INFO ][o.o.k.i.c.K.KNN990CodecTests] [testMultiFieldsKnnIndex] after test
  1> [2024-11-05T14:33:11,307][INFO ][o.o.k.i.c.K.KNN990CodecTests] [testBuildFromModelTemplate] before test
  1> [2024-11-05T14:33:11,463][WARN ][o.o.k.i.c.K.KNN80DocValuesConsumer] [testBuildFromModelTemplate] Refresh operation complete in 2 ms
  1> CheckIndex failed
  1> Checking index with threadCount: 5
  1> 0.00% total deletions; 4 documents; 0 deletions
  1> Segments file=segments_1 numSegments=1 version=9.12.0 id=838s7v3r6oozl7y19i1ybh8c2
  1> 1 of 1: name=_0 maxDoc=4
  1>     version=9.12.0
  1>     id=838s7v3r6oozl7y19i1ybh8bz
  1>     codec=KNN990Codec
  1>     compound=true
  1>     numFiles=4
  1>     size (MB)=0.002
  1>     diagnostics = {os.version=10.0, os.arch=amd64, java.vendor=Azul Systems, Inc., os=Windows Server 2022, java.runtime.version=21.0.5+11-LTS, timestamp=1730838791463, source=flush, lucene.version=9.12.0}
  1>     no deletions
  1>     test: open reader.........FAILED
  1>     WARNING: exorciseIndex() would remove reference to this segment; full exception:
  1> java.lang.IllegalArgumentException: Model ID 'test-model' is not created.
  1>    at org.opensearch.knn.indices.ModelUtil.getModelMetadata(ModelUtil.java:54)
  1>    at org.opensearch.knn.common.FieldInfoExtractor.extractKNNEngine(FieldInfoExtractor.java:42)
  1>    at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeKNNEngine(KNNCodecUtil.java:126)
  1>    at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeEngineFileFromFieldInfo(KNNCodecUtil.java:106)
  1>    at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesProducer.getVectorCacheKeysFromSegmentReaderState(KNN80DocValuesProducer.java:97)
  1>    at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesProducer.<init>(KNN80DocValuesProducer.java:41)
  1>    at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesFormat.fieldsProducer(KNN80DocValuesFormat.java:44)
  1>    at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:52)
  1>    at org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:69)
  1>    at org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:197)
  1>    at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:113)
  1>    at org.apache.lucene.index.CheckIndex.testSegment(CheckIndex.java:1015)
  1>    at org.apache.lucene.index.CheckIndex.lambda$checkIndex$1(CheckIndex.java:854)
  1>    at org.apache.lucene.index.CheckIndex.lambda$callableToSupplier$2(CheckIndex.java:954)
  1>    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
  1>    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  1>    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  1>    at java.base/java.lang.Thread.run(Thread.java:1583)
  1>
  1>
  1> WARNING: 1 broken segments (containing 4 documents) detected
  1> Took 0.005 sec total.
  1>
  1> [2024-11-05T14:33:11,479][INFO ][o.o.k.i.c.K.KNN990CodecTests] [testBuildFromModelTemplate] after test
  1> [2024-11-05T14:33:11,510][INFO ][o.o.k.i.c.K.KNN990CodecTests] [testCodecSetsCustomPerFieldKnnVectorsFormat] before test
  1> [2024-11-05T14:33:11,510][INFO ][o.o.k.i.c.K.KNN990CodecTests] [testCodecSetsCustomPerFieldKnnVectorsFormat] after test
  1> [2024-11-05T14:33:11,526][INFO ][o.o.k.i.c.K.KNN990CodecTests] [testKnnVectorIndex] before test
  1> [2024-11-05T14:33:11,766][INFO ][o.o.k.i.c.K.KNN990CodecTests] [testKnnVectorIndex] after test
WARNING clustering 100 points to 5 centroids: please provide at least 195 training points

> Task :test
927 tests completed, 1 failed, 2 skipped

Tests with failures:
 - org.opensearch.knn.index.codec.KNN990Codec.KNN990CodecTests.testBuildFromModelTemplate

> Task :test FAILED

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test'.
> There were failing tests. See the report at: file:///D:/a/k-NN/k-NN/build/reports/tests/test/index.html

* Try:
> Run with --scan to get full insights.

BUILD FAILED in 41m 31s

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.4/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.
76 actionable tasks: 76 executed
Error: Process completed with exit code 1.

Do you have any additional context? N/A