opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
57 stars 58 forks source link

[BUG] Deep nested map type configuration issue in text_embedding processor #686

Open zane-neo opened 3 months ago

zane-neo commented 3 months ago

What is the bug?

When configured with deep nested map type configuration in text_embedding processor, the embedding result will override the original value of document key: pipeline configuration:

{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "qhO5xY4BYwgbtrHt7KDf",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}

And simulate the pipeline processor:

{
    "docs": [
        {
            "_index": "neural-search-index-v2",
            "_id": "1",
            "_source": {
                "category": [
                    {
                        "name": {
                            "en": "this is a name"
                        }
                    },
                    {
                        "name": {
                            "en": "hello world"
                        }
                    }
                ]
            }
        }
    ]
}

Result:

{
    "docs": [
        {
            "doc": {
                "_index": "neural-search-index-v2",
                "_id": "1",
                "_source": {
                    "category": [
                        {
                            "name": [
                                -0.10758455,
                                0.07971476,
                                -0.04948872,
                                ...
                            ]
                        },
                        {
                            "name": [
                                -0.034477253,
                                0.031023245,
                                0.006734962,
                                ...
                            ]
                        }
                    ]
                },
                "_ingest": {
                    "timestamp": "2024-04-10T03:51:53.496385Z"
                }
            }
        }
    ]
}

Expected result:

{
    "docs": [
        {
            "doc": {
                "_index": "neural-search-index-v2",
                "_id": "1",
                "_source": {
                    "category": [
                        {
                            "name": {
                                "category_name_vector": [
                                    -0.10758455,
                                    0.07971476,
                                    -0.04948872,
                                    ...
                                ],
                                "en": "this is a name"
                            }

                        },
                        {
                            "name": {
                                "name": [
                                    -0.034477253,
                                    0.031023245,
                                    0.006734962,
                                    ...
                                ],
                                "en": "hello world"
                            }
                        }
                    ]
                },
                "_ingest": {
                    "timestamp": "2024-04-10T03:51:53.496385Z"
                }
            }
        }
    ]
}

How can one reproduce the bug?

Steps to reproduce the behavior.

What is the expected behavior?

The generated embedding results should be placed in the right position of the document.

What is your host/environment?

Operating system, version.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

krishy91 commented 3 months ago

Hi, I'll look into this! Seems like a issue with nesting of depth of 2.

zane-neo commented 2 months ago

@krishy91 Since we're supporting list of map type, we don't want any limitation on this, e.g. supporting only depth of 2 or 3. We should consider to support deeply nested cases if possible.

krishy91 commented 2 months ago

Could reproduce the issue. Will push the fix & additional integrration test for such deep nesting cases.