opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[Enhancement] Having a reference to the state in KNN990 vector reader, then construct cache keys to invalidate in close method. #2223

Open 0ctopus13prime opened 1 month ago

0ctopus13prime commented 1 month ago

Description

In NativeEngine990KnnVectorsReader's constructor, we construct cache keys for invalidating with the given segment reader state. But this is unnecessary, we can always construct with field infos and segment info when closing.

    private final List<String> cacheKeys;

    public NativeEngines990KnnVectorsReader(final SegmentReadState state, final FlatVectorsReader flatVectorsReader) {
        this.flatVectorsReader = flatVectorsReader;
        this.segmentReadState = state;
        this.cacheKeys = getVectorCacheKeysFromSegmentReaderState(state); <---------- This!
        loadCacheKeyMap();
    }

    @Override
    public void close() throws IOException {
        // Clean up allocated vector indices resources from cache.
        final NativeMemoryCacheManager nativeMemoryCacheManager = NativeMemoryCacheManager.getInstance();
        cacheKeys.forEach(nativeMemoryCacheManager::invalidate); <------- This!
        ...

Not for sure, but once I tried to change it to have a reference in constructor and build a cache key in close method, I bumped into this exceptions during testing.

Suppressed: java.lang.AssertionError: should not be called by a cluster state applier. reason [the applied cluster state is not yet available]
»               at org.opensearch.cluster.service.ClusterApplierService.assertNotCalledFromClusterStateApplier(ClusterApplierService.java:446) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
»               at org.opensearch.cluster.service.ClusterApplierService.state(ClusterApplierService.java:230) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
»               at org.opensearch.cluster.service.ClusterService.state(ClusterService.java:183) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
»               at org.opensearch.knn.indices.ModelDao$OpenSearchKNNModelDao.getMetadata(ModelDao.java:457) ~[?:?]
»               at org.opensearch.knn.indices.ModelUtil.getModelMetadata(ModelUtil.java:52) ~[?:?]
»               at org.opensearch.knn.common.FieldInfoExtractor.extractKNNEngine(FieldInfoExtractor.java:41) ~[?:?]
»               at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeKNNEngine(KNNCodecUtil.java:126) ~[?:?]
»               at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeEngineFileFromFieldInfo(KNNCodecUtil.java:106) ~[?:?]

Looks like it is relying on cluster state service to get KNN engine info, and it raised the exception which was not thrown if the logic was in constructor.

TO-BE :

    ~~private final List<String> cacheKeys;~~ <------- This will be gone

    @Override
    public void close() throws IOException {
        // Clean up allocated vector indices resources from cache.
        final NativeMemoryCacheManager nativeMemoryCacheManager = NativeMemoryCacheManager.getInstance();
        getVectorCacheKeysFromSegmentReaderState().forEach(nativeMemoryCacheManager::invalidate); <------- This!
0ctopus13prime commented 1 month ago

cc @shatejas for visibility. could you change the tag to enhancement? Thank you.

0ctopus13prime commented 1 month ago

Since this will be a very straightforward refactor, will be included in PR that introduces a writing layer in native engines. RFC : https://github.com/opensearch-project/k-NN/issues/2033

0ctopus13prime commented 3 weeks ago

Hi @shatejas We have a small issue here. IT testing is failing after I moved the cache keys constructing logic from constructor into close method. Main reason is that, when it looks up engine meta data from the cluster state when building cache keys, it is not allowed to do it within a cluster state update thread (e.g. clusterApplierService#updateTask). - code

In addition to that, deciding the caller is not controllable in codec, for example, the index closing API will be handled within the cluster state update thread, which will eventually close KNN vector codec.

We can avoid this by having a map of KNN engine meta data, which I think is bit pointless since the motivation of this working was to save unnecessary bytes.

I think if either way we need to cache something in memory, and given that only few bytes (at most 1KB I believe) is what we can save, can we keep the cached strings then use it in close method? (e.g. keep it as it is) Please feel free drop your thoughts!

»  java.lang.AssertionError: should not be called by a cluster state applier. reason [the applied cluster state is not yet available]
»       at org.opensearch.cluster.service.ClusterApplierService.assertNotCalledFromClusterStateApplier(ClusterApplierService.java:446)
»       at org.opensearch.cluster.service.ClusterApplierService.state(ClusterApplierService.java:230)
»       at org.opensearch.cluster.service.ClusterService.state(ClusterService.java:183)
»       at org.opensearch.knn.indices.ModelDao$OpenSearchKNNModelDao.getMetadata(ModelDao.java:457)
»       at org.opensearch.knn.indices.ModelUtil.getModelMetadata(ModelUtil.java:52)
»       at org.opensearch.knn.common.FieldInfoExtractor.extractKNNEngine(FieldInfoExtractor.java:41)
»       at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeKNNEngine(KNNCodecUtil.java:126)
»       at org.opensearch.knn.index.codec.util.KNNCodecUtil.getNativeEngineFileFromFieldInfo(KNNCodecUtil.java:106)
»       at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsReader.getVectorCacheKeysFromSegmentReaderState(NativeEngines990KnnVectorsReader.java:227)
»       at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsReader.close(NativeEngines990KnnVectorsReader.java:192)
...
...
»       at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:582)
»       at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:503)
»       at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:205)