opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
61 stars 64 forks source link

[BUG] Incorrect validation logic for map type in xxxProcessor #739

Closed zane-neo closed 2 months ago

zane-neo commented 4 months ago

What is the bug? When user use map type configuration in several processors, the validation can fail since validation is been done on extra fields in that map. How can one reproduce the bug? Steps to reproduce the behavior:

PUT /_ingest/pipeline/neural-search-pipeline-v2
{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "WeliNowB6EaQJ_XFf05V",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}

And simulate the ingestion:

POST _ingest/pipeline/neural-search-pipeline-v2/_simulate
{
  "docs": [
    {
      "_index": "neural-search-index-v2",
      "_id": "1",
      "_source": {
        "category": {
          "id": 1,
          "name": {
            "en": "category 1"
          }
        }
      }
    }
  ]
}

Then user can get error like below:

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "map type field [category] has non-string type, cannot process it"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "map type field [category] has non-string type, cannot process it"
      }
    }
  ]
}

What is the expected behavior? Correct embedding should be generated and inserted to the document.

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Root cause is when validating map type data, not only the expected field is validated, but also the unrelated fields are been validated, in above example, "id" is been validated since it's under the category map, and it's value is integer which doesn't supported in text embedding thus the error.

yuye-aws commented 4 months ago

How about resolve this bug in https://github.com/opensearch-project/neural-search/pull/687?

zane-neo commented 4 months ago

How about resolve this bug in #687?

Yes, that's the PR to fix this.

zhichao-aws commented 2 months ago

Fixed in #687