Closed martin-gaievski closed 1 day ago
Looks good. Curious how nested field is processed currently without this change?
Today we treat field name separated with .
as a single field name, if someone put "a.b" : "c"
we will pickup the field only if doc has field "a.b"
, not "a": { "b"}
. There is a workaround, user can put mapping as a structured json, I put example in PR description. In this case UX isn't great for cases when objects have complex structure.
The backport to 2.x
failed:
The process '/usr/bin/git' failed with exit code 1
To backport manually, run these commands in your terminal:
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-811-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fb1f1fda2755676163935dcc278abede8e82bf87
# Push it to GitHub
git push --set-upstream origin backport/backport-811-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x
Then, create a pull request where the base
branch is 2.x
and the compare
/head
branch is backport/backport-811-to-2.x
.
Description
Adding support for complex structures in inference processor definition. Main purpose is to improve user experience for cases when object has complex hierarchical structure, so users can apply easier syntax in processor definition.
Example of such new format:
or
in this case we will try to look for structure like this in the ingest document:
Today we do support only hierarchical type of definition in mapping. It must look exactly like it is in the document:
Note: this change affects only source field (left part of the mapping, one that holds value that is a basis for embedding generation). Today logic for the destination field (right part in mapping, field that will store generated embeddings) will be unchanged. As per today's logic that destination field is expected at the same level with the final source field. Example:
"a.b.c: d"
in this case embeddings are inserted in following structure:Issues Resolved
https://github.com/opensearch-project/neural-search/issues/110
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.