opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
65 stars 67 forks source link

[BUG] 2.15 Hybrid Search With query_text and radial vector search producing array out of bounds #973

Open iyoung opened 3 weeks ago

iyoung commented 3 weeks ago

What is the bug?

Running a hybrid query which contains a min score for the vector side below 0.5 and providing a query_text form lexical search for certain searches (possibly related to number of matches) results in the following response:

{
    "error": {
        "root_cause": [
            {
                "type": "index_out_of_bounds_exception",
                "reason": "index_out_of_bounds_exception: null"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "xxxxxx",
                "node": "xxxxxx",
                "reason": {
                    "type": "index_out_of_bounds_exception",
                    "reason": "index_out_of_bounds_exception: null"
                }
            }
        ]
    },
    "status": 500
}

How can one reproduce the bug?

This is the structure of query I am using which always throws the exception.

{
  "size": 50,
  "track_total_hits": true,
  "query": {
    "hybrid": {
      "queries": [
        {
          "knn": {
            "vector": {
              "vector": [
                -0.018432617,
                0.05419922,
                ...
              ],
              "min_score": 0.4
            }
          }
        },
        {
          "query_string": {            
            "fields": [
              "essential_words^2",
              "caption^3",
              "description^3",
              "main_words",
              "extra_words"
            ],
            "query": "\"cold weather\"",
            "default_operator": "AND"
          }
        }
      ]
    }
  },
  "post_filter": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "vector"
          }
        }
      ]
    }
  },
  "search_pipeline": {
    "description": "Inline post processor for hybrid search",
    "phase_results_processors": [
      {
        "normalization-processor": {
          "normalization": {
            "technique": "min_max"
          },
          "combination": {
            "technique": "arithmetic_mean",
            "parameters": {
              "weights": [
                0.5,
                0.5
              ]
            }
          }
        }
      }
    ]
  }
}

What is the expected behaviour?

Search results returned

What is your host/environment?

AWS managed Opensearch 2.15

Do you have any additional context?

Increasing the radial search threshold by increasing the min score on the vector search to 0.5 to 1 stops this, also by changing the query term or fields included and therefore the matched items also avoids this.

There are numerous exact phrase search terms which cause this issue for us, such as "group hiking" and "cold weather"

In the error logs I get: -

Failed to execute phase [query], all shards failed; shardFailures {[xxxxxxx][xxxxxxx][0]: RemoteTransportException[[f6b3bd921afde7de466c508873aece19][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: NotSerializableExceptionWrapper[index_out_of_bounds_exception: null]; }
    at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:780)
    at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:397)
    at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:820)
    at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:558)
    at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:318)
    at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104)
    at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75)
    at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:764)
    at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1729)
    at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:436)
    at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1515)
    at org.opensearch.transport.NativeMessageHandler.lambda$handleException$5(NativeMessageHandler.java:454)
    at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:412)
    at org.opensearch.transport.NativeMessageHandler.handleException(NativeMessageHandler.java:452)
    at org.opensearch.transport.NativeMessageHandler.handlerResponseError(NativeMessageHandler.java:444)
    at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:172)
    at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:126)
    at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:121)
    at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:113)
    at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:800)
    at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.forwardFragments(NativeInboundBytesHandler.java:157)
    at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.doHandleBytes(NativeInboundBytesHandler.java:94)
    at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:143)
    at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:119)
    at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
    at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
    at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
    at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
    at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at __PATH__(Thread.java:1583)
Caused by: NotSerializableExceptionWrapper[index_out_of_bounds_exception: null]
    at java.nio.Buffer$1.apply(Buffer.java:757)
    at java.nio.Buffer$1.apply(Buffer.java:754)
    at jdk.internal.util.Preconditions$4.apply(Preconditions.java:213)
    at jdk.internal.util.Preconditions$4.apply(Preconditions.java:210)
    at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:98)
    at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
    at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
    at java.nio.Buffer.checkIndex(Buffer.java:768)
    at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:342)
    at org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:119)
    at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.readByte(ByteBufferIndexInput.java:583)
    at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3.longValue(Lucene90NormsProducer.java:389)
    at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47)
    at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60)
    at org.apache.lucene.search.PhraseScorer.score(PhraseScorer.java:83)
    at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65)
    at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178)
    at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:193)
    at org.opensearch.neuralsearch.search.collector.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:100)
    at org.opensearch.common.lucene.search.FilteredCollector$1.collect(FilteredCollector.java:79)
    at org.apache.lucene.search.MultiCollector$MultiLeafCollector.collect(MultiCollector.java:226)
    at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:296)
    at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236)
    at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71)
    at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
    at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:334)
    at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:285)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
    at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:361)
    at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:468)
    at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher$DefaultQueryPhaseSearcherWithEmptyQueryCollectorContext.searchWithCollector(HybridQueryPhaseSearcher.java:199)
    at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:438)
    at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:65)
    at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:284)
    at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:157)
    at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:589)
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:653)
    at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:622)
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:950)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.lang.Thread.run(Thread.java:1583)
martin-gaievski commented 2 weeks ago

@iyoung thank you for informing us about this scenario. I need to request some additional information from you: what is the mapping for your index, what's the index configuration (number of nodes, shards prime and replicas), how many documents do you have, do you expect that query that is failing return search hits, if yes then approximately how many of them.

I have tried following scenario, it works fine on my side:

  1. create index with knn vector field

    PUT /index-test
    {
    "settings": {
    "index": {
      "knn": true
    }
    },
    "mappings": {
    "properties": {
      "vector": {
        "type": "knn_vector",
        "dimension": 3,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene"
        }
      },
      "field1": {
        "type": "integer"
      },
      "name": {
        "type": "text"
      }
    }
    }
    }
  2. ingest several documents with vectors and text fields:

    POST /index-test/_bulk?refresh
    {"index":{}}
    {"field1": 2,"vector": [0.4, 0.5, 0.2],"title": "basic", "name": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .", "category": "novel", "price": 20}
    {"index":{}}
    { "name": "I brought home the trophy", "category": "story", "price": 20, "field1": 10,"vector": [0.2, 0.2, 0.3],"title": "java"}
    {"index":{}}
    {"field1": 50,"vector": [4.2, 5.5, 8.9],"name": "Why would he go to all that effort for a free pack of ranch dressing?", "category": "story", "price": 10 }
    {"index":{}}
    {"vector": [0.3, 0.12, 3.3],"title": "python","name": "In the next 40-50 years I plan on opening up my own business.","category": "poem","price": 100}
    {"index":{}}
    {  "field1": 100,"vector": [0.2, 0.2, 0.3],"title": "java", "name": "Does he have a big family?", "category": "biography", "price": 70}
    {"index":{}}
    {"name": "She is my younger sister","category": "workbook","price": 25}
  3. run search with hybrid query

    GET /index-test/_search
    {
    "size": 50,
    "track_total_hits": true,
    "query": {
        "hybrid": {
            "queries": [
                {
                    "knn": {
                        "vector": {
                            "vector": [
                                0.15,
                                0.3,
                                1.1
                            ],
                            "min_score": 0.2
                        }
                    }
                },
                {
                    "query_string": {
                        "fields": [
                            "title^2",
                            "name^3"
                        ],
                        "query": "\"small\"",
                        "default_operator": "AND"
                    }
                }
            ]
        }
    },
    "post_filter": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "vector"
                    }
                }
            ]
        }
    },
    "search_pipeline": {
        "description": "Inline post processor for hybrid search",
        "phase_results_processors": [
            {
                "normalization-processor": {
                    "normalization": {
                        "technique": "min_max"
                    },
                    "combination": {
                        "technique": "arithmetic_mean",
                        "parameters": {
                            "weights": [
                                0.5,
                                0.5
                            ]
                        }
                    }
                }
            }
        ]
    }
    }

I tried multiple search words and different values of min_score

max-shyman commented 2 weeks ago

Same for my query after 2.13 > 2.15 upgrade. Exactly as described by @iyoung and also very similar to description https://github.com/opensearch-project/neural-search/issues/497

Especially this:

Honestly, it's very hard to reproduce the bug. It only happens for Hybrid search. I observe a pattern that queries with more than one word tend to be more likely to have this error than simple queries. Queries that failed are like "horror movies", "teen mom", "news radio". I also observed that when I changed the index data, some queries started working, and other queries started failing.

Issue happens randomly and it is possible to reproduce only for several minutes/hours. I cannot reproduce it for totally the same query after (probably, index data changes affect this).

2 nodes, 1 primary shard, 1 replica shard, ~600k documents (~13GB), hnsw, faiss

Query is mostly the same as topic starter query but with three subqueries (text search + 2 knn). And also no min_score for knn (because it doesn't exist in 2.13), instead knn queris in subqueries are wrapped by function_score with own min_score.

Any ideas?

iyoung commented 2 weeks ago

@iyoung thank you for informing us about this scenario. I need to request some additional information from you: what is the mapping for your index, what's the index configuration (number of nodes, shards prime and replicas), how many documents do you have, do you expect that query that is failing return search hits, if yes then approximately how many of them.

I have tried following scenario, it works fine on my side:

  1. create index with knn vector field
PUT /index-test
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "vector": {
        "type": "knn_vector",
        "dimension": 3,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene"
        }
      },
      "field1": {
        "type": "integer"
      },
      "name": {
        "type": "text"
      }
    }
  }
}
  1. ingest several documents with vectors and text fields:
POST /index-test/_bulk?refresh
{"index":{}}
{"field1": 2,"vector": [0.4, 0.5, 0.2],"title": "basic", "name": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .", "category": "novel", "price": 20}
{"index":{}}
{ "name": "I brought home the trophy", "category": "story", "price": 20, "field1": 10,"vector": [0.2, 0.2, 0.3],"title": "java"}
{"index":{}}
{"field1": 50,"vector": [4.2, 5.5, 8.9],"name": "Why would he go to all that effort for a free pack of ranch dressing?", "category": "story", "price": 10 }
{"index":{}}
{"vector": [0.3, 0.12, 3.3],"title": "python","name": "In the next 40-50 years I plan on opening up my own business.","category": "poem","price": 100}
{"index":{}}
{  "field1": 100,"vector": [0.2, 0.2, 0.3],"title": "java", "name": "Does he have a big family?", "category": "biography", "price": 70}
{"index":{}}
{"name": "She is my younger sister","category": "workbook","price": 25}
  1. run search with hybrid query
GET /index-test/_search
{
    "size": 50,
    "track_total_hits": true,
    "query": {
        "hybrid": {
            "queries": [
                {
                    "knn": {
                        "vector": {
                            "vector": [
                                0.15,
                                0.3,
                                1.1
                            ],
                            "min_score": 0.2
                        }
                    }
                },
                {
                    "query_string": {
                        "fields": [
                            "title^2",
                            "name^3"
                        ],
                        "query": "\"small\"",
                        "default_operator": "AND"
                    }
                }
            ]
        }
    },
    "post_filter": {
        "bool": {
            "must": [
                {
                    "exists": {
                        "field": "vector"
                    }
                }
            ]
        }
    },
    "search_pipeline": {
        "description": "Inline post processor for hybrid search",
        "phase_results_processors": [
            {
                "normalization-processor": {
                    "normalization": {
                        "technique": "min_max"
                    },
                    "combination": {
                        "technique": "arithmetic_mean",
                        "parameters": {
                            "weights": [
                                0.5,
                                0.5
                            ]
                        }
                    }
                }
            }
        ]
    }
}

I tried multiple search words and different values of min_score

Thank you, I am in contact with the Opensearch managed service team within AWS about this issue and looking to replicate this with a smaller index. The index we're using has around 2m documents in it. Once I have a more concrete way to reproduce this I will update. Thank you for replying.