opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
57 stars 58 forks source link

Enable '.' for nested field in text embedding processor #811

Closed martin-gaievski closed 1 day ago

martin-gaievski commented 1 week ago

Description

Adding support for complex structures in inference processor definition. Main purpose is to improve user experience for cases when object has complex hierarchical structure, so users can apply easier syntax in processor definition.

Example of such new format:

"a.b.c.d": "field"

or

"a.b" : {
   "c.d": "field"
}

in this case we will try to look for structure like this in the ingest document:

"a" : {
   "b": {
     "c" : {
       "d" : "field"

Today we do support only hierarchical type of definition in mapping. It must look exactly like it is in the document:

"a" : {
   "b": {
     "c" : {
       "d" : "field"

Note: this change affects only source field (left part of the mapping, one that holds value that is a basis for embedding generation). Today logic for the destination field (right part in mapping, field that will store generated embeddings) will be unchanged. As per today's logic that destination field is expected at the same level with the final source field. Example: "a.b.c: d" in this case embeddings are inserted in following structure:

"a" : {
   "b" : {
     "d" : [0.1, 0.2 ....]

Issues Resolved

https://github.com/opensearch-project/neural-search/issues/110

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

martin-gaievski commented 1 day ago

Looks good. Curious how nested field is processed currently without this change?

Today we treat field name separated with . as a single field name, if someone put "a.b" : "c" we will pickup the field only if doc has field "a.b", not "a": { "b"}. There is a workaround, user can put mapping as a structured json, I put example in PR description. In this case UX isn't great for cases when objects have complex structure.

opensearch-trigger-bot[bot] commented 1 day ago

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-811-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fb1f1fda2755676163935dcc278abede8e82bf87
# Push it to GitHub
git push --set-upstream origin backport/backport-811-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-811-to-2.x.