opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
62 stars 65 forks source link

[BUG] Hybrid query returns error when `request_cache` flag is set in search request #601

Closed martin-gaievski closed 2 months ago

martin-gaievski commented 8 months ago

What is the bug?

hybrid search query returns below error message for any search request with flag request_cache

"type": "illegal_state_exception",
"reason": "Score normalization processor cannot produce final query result"

How can one reproduce the bug?

Detailed steps added at the end of this issue, most important pieces are:

What is the expected behavior?

Query should return actual result instead of error

What is your host/environment?

2.11 and latest main both have this issue

Do you have any additional context?

Issue can be avoided by setting 2+ shards for index. It's specific to 1 shard scenario

As per analysis query phase searcher is skipped in case of a second + query when request_cache is set, results are taken from local shard cache and passed to normalization processor. Probably logic for merging fetch and query results should be changed to deal with that cached data properly.

detailed steps to repro:

DELETE example-index

PUT example-index
{
  "settings": {
    "index": {
      "knn": true,
      "number_of_shards": 1,
      "number_of_replicas": 0
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "my_vector": {
        "type": "knn_vector",
        "dimension": 1,
        "method": {
          "name": "hnsw",
          "space_type": "innerproduct",
          "engine": "faiss"
        }
      },
      "integer": {
        "type": "integer"
      }
    }
  }
}

PUT example-index/_bulk?refresh
{"index":{"_id":"1"}}
{"text": "neural","my_vector": [1], "integer": 1 }
{"index":{"_id":"2"}}
{"text": "neural neural","my_vector": [2], "integer": 2 }
{"index":{"_id":"3"}}
{"text": "neural neural neural","my_vector": [3], "integer": 3 }
{"index":{"_id":"4"}}
{"text": "neural neural neural neural", "integer": 4 }
{"index":{"_id":"5"}}
{"my_vector": [0], "integer": 5 }

PUT /_search/pipeline/nlp-search-pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        }
      }
    }
  ]
}

POST example-index/_search?search_pipeline=nlp-search-pipeline&request_cache=true&preference=_local
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "term": {
            "text": "neural"
          }
        },
        {
          "knn": {
            "my_vector": {
              "vector": [
                3
              ],
              "k": 3
            }
          }
        }
      ]
    }
  },
  "size": 3
}

actual response:

{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "The phase has failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "illegal_state_exception",
      "reason": "Score normalization processor cannot produce final query result"
    }
  },
  "status": 500
}
jmazanec15 commented 5 months ago

@martin-gaievski Im moving this one to 2.15 because I dont think a PR is up yet

martin-gaievski commented 2 months ago

it's fixed in https://github.com/opensearch-project/neural-search/pull/663, closing this issue