opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
157 stars 115 forks source link

[BUG] Knn Doc ingestion throws ClassCastException when used with OS plugin implementing IndexStorePlugin #1228

Open jitendra-titaniam opened 1 year ago

jitendra-titaniam commented 1 year ago

What is the bug? Opensearch IndexStorePlugin allows plugin developers to provide custom implementation of org.apache.lucene.store.Directory Plugin developer can write their own implementation of Directory.createOutput(String filename, IOContext context) and Directory.openInput(String filename, IOContext context) methods. The current implementation in KNN80DocValuesConsumer.addKNNBinaryField gives ClassCastException if any other implementation of Directory is used instead of org.apache.lucene.store.FSDirectory. Further more it does not use the Directory abstraction to createOutput rather writes to the IndexPath directly.

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Create a Opensearch Plugin of type IndexStorePlugin. Implement getDirectoryFactories(). For index.store.type of titaniam this method returns an subclass of FsDirectoryFactory that creates custom implementation of Directory rather than an FSDirectory.
  2. Install the plugin and restart Opensearch
  3. Create an index
    PUT housing-index1
    {
    "settings": {
    "index.knn": true,
    "index.store.type": "titaniam"
    },
    "mappings": {
    "properties": {
      "housing-vector": {
        "type": "knn_vector",
        "dimension": 3
      },
      "title": {
        "type": "text"
      },
      "price": {
        "type": "long"
      },
      "location": {
        "type": "geo_point"
      }
    }
    }
    }
  4. Ingest a document
    POST housing-index1/_doc
    {
    "housing-vector": [
    10,
    20,
    30
    ],
    "title": "2 bedroom in downtown Seattle",
    "price": "2800",
    "location": "47.71, 122.00"
    }

    Following expection is shown in the opensearch.log

[2023-09-26T20:13:00,327][WARN ][o.o.i.e.Engine           ] [kanchipuram.local] [housing-index1][0] failed engine [refresh failed source[schedule]]
java.lang.ClassCastException: class com.titaniamlabs.lucene.store.TitaniamDirectory cannot be cast to class org.apache.lucene.store.FSDirectory (com.titaniamlabs.lucene.store.TitaniamDirectory is in unnamed module of loader java.net.FactoryURLClassLoader @d5ce97f; org.apache.lucene.store.FSDirectory is in unnamed module of loader 'app')
    at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addKNNBinaryField(KNN80DocValuesConsumer.java:131) ~[?:?]
    at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addBinaryField(KNN80DocValuesConsumer.java:78) ~[?:?]
    at org.apache.lucene.index.BinaryDocValuesWriter.flush(BinaryDocValuesWriter.java:132) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.IndexingChain.writeDocValues(IndexingChain.java:400) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.IndexingChain.flush(IndexingChain.java:258) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:392) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:493) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:570) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:73) ~[opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:53) ~[opensearch-2.7.0.jar:2.7.0]
    at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:432) ~[opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:412) ~[opensearch-2.7.0.jar:2.7.0]
    at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
    at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1846) [opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1825) [opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:4172) [opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.index.IndexService.maybeRefreshEngine(IndexService.java:983) [opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1116) [opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159) [opensearch-2.7.0.jar:2.7.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.7.0.jar:2.7.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]

What is the expected behavior? Knn DocValuesConsumer should use the lucene Directory abstraction to add Binary Field, so that OS plugins implementing IndexStorePlugin can work.

What is your host/environment? Happening on all envs.

Do you have any screenshots? NA

Do you have any additional context?

  1. Knn search on Elasticsearch 8.7.0 works with this plugin that implements IndexStorePlugin.

  2. Following is the failing line. It is casting to FSDirectory

https://github.com/opensearch-project/k-NN/blob/ca5e483e1e70abdc19196e1018b3a7fd06908bf6/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java#L121

Following codes then goes on to directly write to lucene file without going through any of the Directory abstractions.

https://github.com/opensearch-project/k-NN/blob/ca5e483e1e70abdc19196e1018b3a7fd06908bf6/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java#L128 and https://github.com/opensearch-project/k-NN/blob/ca5e483e1e70abdc19196e1018b3a7fd06908bf6/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java#L143

These codes has to be resolved to write through Directory.

vamshin commented 1 year ago

@jitendra-titaniam thanks for raising the issue. Would you be able to contribute the fix?

jmazanec15 commented 1 year ago

So this is tricky. The problem is that the way we access the "native" index files is by directly passing the file path to the native libraries (i.e. faiss and nmslib). This means we are not accessing the index via the IndexInput abstraction, which is returned from the directory. This becomes tricky when using things like remove stores.

Workarounds are kind of tough:

  1. First, we can ensure that the directory being used ends up being an FSDirectory. Im assuming any component implementing IndexStorePlugin will need something locally to search around. The basic idea would be to unwrap until we get there
  2. You could use lucene engine if you need something quicker.

Not sure what best route forward is for long-term solution. Implement native engines by reading from IndexInput would be very challenging if not impossible. We might be able to wrap this in more friendly abstraction that will work better with IndexStorePlugin though.

navneet1v commented 2 months ago

@jitendra-titaniam I have created this GH issue for using IndexInput for graph files. I am hoping it can solve the issue you faced: https://github.com/opensearch-project/k-NN/issues/2033

jmazanec15 commented 3 weeks ago

Will be completed in 2.18 once loading layer changes are completed.

navneet1v commented 3 weeks ago

@jitendra-titaniam this is the RFC for loading layer: https://github.com/opensearch-project/k-NN/issues/2033 please review and see if this can solve your issue. I believe it will

dblock commented 2 weeks ago

[Catch All Triage - 1, 2]