{
"error": {
"root_cause": [
{
"type": "null_pointer_exception",
"reason": "Cannot invoke \"org.apache.lucene.index.FieldInfo.getAttribute(String)\" because \"fieldInfo\" is null"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my-knn-index-1",
"node": "l4iYtRCNSDOBxppXQULzig",
"reason": {
"type": "null_pointer_exception",
"reason": "Cannot invoke \"org.apache.lucene.index.FieldInfo.getAttribute(String)\" because \"fieldInfo\" is null"
}
}
],
"caused_by": {
"type": "null_pointer_exception",
"reason": "Cannot invoke \"org.apache.lucene.index.FieldInfo.getAttribute(String)\" because \"fieldInfo\" is null",
"caused_by": {
"type": "null_pointer_exception",
"reason": "Cannot invoke \"org.apache.lucene.index.FieldInfo.getAttribute(String)\" because \"fieldInfo\" is null"
}
}
},
"status": 500
}
Stack Trace
opensearch-node1 | org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
opensearch-node1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:775) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:815) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:548) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:316) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:766) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1741) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:420) [opensearch-security-2.18.0.0.jar:2.18.0.0]
opensearch-node1 | at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1527) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.NativeMessageHandler.lambda$handleException$5(NativeMessageHandler.java:454) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.NativeMessageHandler.handleException(NativeMessageHandler.java:452) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.NativeMessageHandler.handlerResponseError(NativeMessageHandler.java:444) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:172) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:126) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:120) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:112) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:796) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.InboundBytesHandler.forwardFragments(InboundBytesHandler.java:137) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.InboundBytesHandler.doHandleBytes(InboundBytesHandler.java:77) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:124) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:113) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.18.0.jar:2.18.0]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1503) [netty-handler-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1366) [netty-handler-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1415) [netty-handler-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) [netty-codec-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) [netty-codec-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.114.Final.jar:4.1.114.Final]
opensearch-node1 | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-node1 | Caused by: org.opensearch.OpenSearchException$3: Cannot invoke "org.apache.lucene.index.FieldInfo.getAttribute(String)" because "fieldInfo" is null
opensearch-node1 | at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:710) ~[opensearch-core-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:393) [opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | ... 51 more
opensearch-node1 | Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.lucene.index.FieldInfo.getAttribute(String)" because "fieldInfo" is null
opensearch-node1 | at org.opensearch.knn.common.FieldInfoExtractor.getSpaceType(FieldInfoExtractor.java:89) ~[?:?]
opensearch-node1 | at org.opensearch.knn.index.query.ExactSearcher.getKNNIterator(ExactSearcher.java:153) ~[?:?]
opensearch-node1 | at org.opensearch.knn.index.query.ExactSearcher.searchLeaf(ExactSearcher.java:62) ~[?:?]
opensearch-node1 | at org.opensearch.knn.index.query.KNNWeight.exactSearch(KNNWeight.java:388) ~[?:?]
opensearch-node1 | at org.opensearch.knn.index.query.nativelib.NativeEngineKnnVectorQuery.lambda$doRescore$1(NativeEngineKnnVectorQuery.java:124) ~[?:?]
opensearch-node1 | at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
opensearch-node1 | at org.apache.lucene.search.TaskExecutor$TaskGroup$1.run(TaskExecutor.java:120) ~[lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
opensearch-node1 | at org.apache.lucene.search.TaskExecutor$TaskGroup.invokeAll(TaskExecutor.java:176) ~[lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
opensearch-node1 | at org.apache.lucene.search.TaskExecutor.invokeAll(TaskExecutor.java:84) ~[lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
opensearch-node1 | at org.opensearch.knn.index.query.nativelib.NativeEngineKnnVectorQuery.doRescore(NativeEngineKnnVectorQuery.java:127) ~[?:?]
opensearch-node1 | at org.opensearch.knn.index.query.nativelib.NativeEngineKnnVectorQuery.createWeight(NativeEngineKnnVectorQuery.java:73) ~[?:?]
opensearch-node1 | at org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:899) ~[lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
opensearch-node1 | at org.opensearch.search.internal.ContextIndexSearcher.createWeight(ContextIndexSearcher.java:226) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560) ~[lucene-core-9.12.0.jar:9.12.0 e913796758de3d9b9440669384b29bec07e6a5cd - 2024-09-25 16:37:02]
opensearch-node1 | at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:355) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:462) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:450) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:432) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:61) ~[?:?]
opensearch-node1 | at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:646) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:710) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:679) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1005) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.18.0.jar:2.18.0]
opensearch-node1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
opensearch-node1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
opensearch-node1 | at java.lang.Thread.run(Thread.java:1583) ~[?:?]
Impact Versions
5.17 and 2.18
Root cause
Based on my deep-dive what I can see is when documents are deleted in Opensearch, opensearch marks the document as deleted in the main segment but also creates a segment which has deleted docs in it. Refer below response for segments:
and now in this segment as there is no document present then no field info is present too. Due to no field info present, when rescore phase happens in disk based vector search, we are not able to get the fieldInfo for the field because that field is not present in the segment.
The bug is not limited to deleted docs segment, but will also happen if a segment doesn't contain the vector field, because in that case too, the field info will be null for that segment. Validated the same by ingesting a doc where segments have vector field and no vector field.
Thanks @heemin32 for reporting the bug related to deleted docs.
Workaround
If the index has segments with deleted docs only then doing flush on the index may solve the problem. POST <index-name>/_flush
For segment with no vector field, force merge to 1 segment is required to ensure that there are no segments without vector field.
Cases with No workaround
If the docs containing and not containing vectors are divided among different shards and one shard has no vector field doc then there is no workaround since in that shard there will be no doc with vector field hence field info will never be present and exception will keep on happening.
Description
When a k-NN index(with on_disk mode) has deleted documents/ in it, then while doing the search the search is failing with NPE.
Impacted cases, refer below sections for workarounds:
Please refer the below steps for reproduction.
Steps to Reproduce with deleted docs
Create Index
Ingest 2 documents
Search working as expected
Delete a document
Search Again with error
Error Response
Stack Trace
Impact Versions
5.17 and 2.18
Root cause
Based on my deep-dive what I can see is when documents are deleted in Opensearch, opensearch marks the document as deleted in the main segment but also creates a segment which has deleted docs in it. Refer below response for segments:
and now in this segment as there is no document present then no field info is present too. Due to no field info present, when rescore phase happens in disk based vector search, we are not able to get the fieldInfo for the field because that field is not present in the segment.
The bug is not limited to deleted docs segment, but will also happen if a segment doesn't contain the vector field, because in that case too, the field info will be null for that segment. Validated the same by ingesting a doc where segments have vector field and no vector field.
Line which is giving null field info: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/query/ExactSearcher.java#L152
NPE will come from this line: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/common/FieldInfoExtractor.java#L88
Thanks @heemin32 for reporting the bug related to deleted docs.
Workaround
POST <index-name>/_flush
Cases with No workaround
If the docs containing and not containing vectors are divided among different shards and one shard has no vector field doc then there is no workaround since in that shard there will be no doc with vector field hence field info will never be present and exception will keep on happening.