opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[BUG] Dynamic template for vectorized output fields #2247

Open juntezhang opened 1 year ago

juntezhang commented 1 year ago

What is the bug?

I want to use dynamic templates to create vectorized output fields. In the ingest processor I want to create the configuration only once, without having to configure again in the index mapping.

This is not working, because I am getting the following error:

"error": {
                    "type": "mapper_parsing_exception",
                    "reason": "failed to parse field [Body_vector] of type [knn_vector] in document with id '9'. Preview of field's value: '-0.083671086'",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "Vector dimension mismatch. Expected: 384, Given: 1"
                    }
                }

I have a field called Body that consists of text, and the neural Ingest pipeline will create an output field called Body_vector. The dimension has already been set to 384 but it sets it to 1 or parses it as 1.

How can one reproduce the bug?

Steps to reproduce the behavior.

Follow the Neural Search plugin tutorial created by Sease, but create an index with a dynamic template like this:

"dynamic_templates": [
      {
        "vectorized": {
          "match_mapping_type": "double",
          "match_pattern": "regex",
          "path_match": ".*_vector.*",
          "mapping": {
            "type": "knn_vector",
            "dimension": 384,
            "method": {
              "name": "hnsw",
              "engine": "lucene"
            }
          }
        }
      }
]

Index documents and see that you get above exception thrown.

What is the expected behavior?

A clear and concise description of what you expected to happen.

The expected behavior is that the dynamic template should create the vectorized output fields as configured in the mapping and index without errors.

What is your host/environment?

Operating system, version.

Mac 13.3, but running OpenSearch in Docker with Ubuntu.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

N/A

Do you have any additional context?

Add any other context about the problem.

I am happy to contribute to a solution.

navneet1v commented 1 year ago

@juntezhang Thanks for reporting the issue. I will try reproduce the issue and see what is happening.

martin-gaievski commented 2 weeks ago

looks like issue is related to knn vector_field type. @navneet1v @jmazanec15 any objections if we move it to knn repo?

heemin32 commented 2 weeks ago

Preview of field's value: '-0.083671086'

What is the actual field's value generated by ingest processor?

dblock commented 1 week ago

[Catch All Triage - 1, 2, 3]