opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
61 stars 65 forks source link

[BUG] bug with GET /your_index/_search and POST _reindex in OS 2.11 #626

Open LandryK opened 7 months ago

LandryK commented 7 months ago

What is the bug?

When you configure an index with a search pipeline as below the GET /your_index/_search yields a 500 error - Null Pointer Exception. It seems like by default the _reindex API uses the GET /your_index/_search as such if you attempt to perform re-index with a source_index that has a index.search.default_pipeline configured you will get a Null Pointer Exception.

POST _reindex
{
   "source":{
      "index":"source-index"
   },
   "dest":{
      "index":"destination-index",
      "pipeline": "your-pipeline"
   }
}`
### How can one reproduce the bug?

1- Create Pipeline:
`PUT /_search/pipeline/test-pipeline
{
  "request_processors": [
    {
      "neural_query_enricher" : {
        "tag": "tag1",
        "description": "your description",
        "default_model_id": "your_model_id"
      }
    }
  ]
}

2- Create index with pipeline and setup default search pipeline

PUT /source-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    },
    "default_pipeline": "test-pipeline",
    "index.search.default_pipeline" : "test-pipeline"
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "passage_embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}

3- add documents

PUT /source-index/_doc/1
{
    "text": "The emergence of resistance of bacteria to antibiotics is a common phenomenon. Emergence of resistance often reflects evolutionary processes that take place during antibiotic therapy."

}
PUT /source-index/_doc/2
{
  "text": "The successful outcome of antimicrobial therapy with antibacterial compounds depends on several factors. These include host defense mechanisms, the location of infection, and the pharmacokinetic and pharmacodynamic properties of the antibacterial."
}

4- search (This will yield a NPE-500 error) GET /source-index/_search This will give NPE due to "index.search.default_pipeline" : "test-pipeline". If you remove this setting in the index setting, the query works just fine.

5- Proceed to Create a destination index

PUT /destination-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    },
    "default_pipeline": "test-pipeline",
    "index.search.default_pipeline" : "test-pipeline"
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "passage_embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}

6- Attempt to perform re-indexing (This will give you a NPE - 500 error)

POST _reindex
{
   "source":{
      "index":"source-index"
   },
   "dest":{
      "index":"destination-index",
      "pipeline": "test-pipeline"
   }
}

The above will not work because GET /source-index/_search gives a NPE error due to the "index.search.default_pipeline" : "test-pipeline" as discussed in Step 4. However if you remove the "index.search.default_pipeline" : "test-pipeline" the index settings, the query works.

7- if you try re-index with below it works

POST _reindex
{
   "source":{
      "index":"source-index",
      "query": {
          "match_all": {}
      }
   },
   "dest":{
      "index":"destination-index",
      "pipeline": "test-pipeline"
   }
}

Bugs:

1- Seems like by default the _reindex API is using GET /source-index/_search instead of GET /your_index/_search{"query":{"match_all":{}}} and since the former throws a Null Pointer Error, the _reindex also throws the same as it is unable to get the list of documents in source index.

2- GET /source-index/_search does not work if the "index.search.default_pipeline" : "test-pipeline" is present in index setting

OpenSearch Version

OS 2.11

navneet1v commented 6 months ago

@vibrantvarun can you take a look into this issue. Seems like some issue with NeuralQueryEnricher processor

LandryK commented 5 months ago

@vamshin @vibrantvarun Any updates on this? Thanks