opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
54 stars 58 forks source link

[FEATURE] Treat . in the field name as a nested field in the fields map of text embedding processor #110

Open navneet1v opened 1 year ago

navneet1v commented 1 year ago

Is your feature request related to a problem?

This is related to customer created Github issue: https://github.com/opensearch-project/neural-search/issues/109

The following configuration using a nested source field, embeddings are not computed, which should be supported:

PUT /_ingest/pipeline/neural_pipeline_nested
{
  "description": "Neural Search Pipeline for message content",
  "processors": [
    {
      "text_embedding": {
        "model_id": "SXXx8YUBR2ZWhVQIkghB",
        "field_map": {
          "message.text": "message_embedding"
        }
      }
    }
  ]
}

PUT /neural-test-index-nested
{
    "settings": {
        "index.knn": true,
        "default_pipeline": "neural_pipeline_nested"
    },
    "mappings": {
        "properties": {
            "message_embedding": {
                "type": "knn_vector",
                "dimension": 384,
                "method": {
                    "name": "hnsw",
                    "engine": "lucene"
                }
            },
            "message.text": { 
                "type": "text"            
            },
            "color": {
                "type": "text"
            }
        }
    }
}

POST /_bulk
{"create":{"_index":"neural-test-index-nested","_id":"0"}}
{"message":{"text":"Text 1"},"color":"red"}
{"create":{"_index":"neural-test-index-nested","_id":"1"}}
{"message":{"text":"Text 2"}, "color": "black"}

GET /neural-test-index-nested/_search

What solution would you like?

The fields map keys should support . operator to define the nested fields.

What alternatives have you considered?

Customer can create a nested field mapping using:

PUT /neural-test-index-nested
{
    "description": "Neural Search Pipeline for message content",
    "processors": [
        {
            "text_embedding": {
                "model_id": "SXXx8YUBR2ZWhVQIkghB",
                "field_map": {
                    "message": {
                        "text": "message_embedding"
                    }
                }
            }
        }
    ]
}
asfoorial commented 9 months ago

Is this going to also handle inner documents "nested" field types?

Sanjana679 commented 8 months ago

I'm going to tackle this issue!

samuel-oci commented 6 months ago

@navneet1v what is the expected behavior in case of nested field type as opposed to the above object field example? How will the flattening to an array be handled? For context Nested field: https://opensearch.org/docs/latest/field-types/supported-field-types/nested/ object field: https://opensearch.org/docs/latest/field-types/supported-field-types/object/

samuel-oci commented 6 months ago

FYI, there is this issue as well regarding chunking: https://github.com/opensearch-project/neural-search/issues/482

The question above might be related to it, if the scope of this ticket is only for object fields (not nested fields) then we can continue the discussion on https://github.com/opensearch-project/neural-search/issues/482

ripineros commented 3 weeks ago

Is this still in progress?

Sanjana679 commented 3 weeks ago

This is still in progress. I hope to continue working on it in the next couple of weeks. Sorry for the delay. Sincerely,Sanjana NandiOn Jun 4, 2024, at 4:13 PM, ripineros @.***> wrote: Is this still in progress?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were assigned.Message ID: @.***>

navneet1v commented 2 weeks ago

@Sanjana679 are you still working? I am not seeing any updates on the PR.