opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Disallow invalid characters for physical file name to be included within vector field name. #1936

Closed 0ctopus13prime closed 3 months ago

0ctopus13prime commented 3 months ago

Description

Issue : https://github.com/opensearch-project/k-NN/issues/1859.

Issue

While OpenSearch does allow for a field name to have an empty space within it and it disallows an empty space to be contained in a physical file name, KNNCodecUtil::buildEngineFileName uses the field name directly as a part of a vector file name. As a result, in case where the field name had one of disallowed character for a physical file name, it fails in validation of BlobStoreIndexShardSnapshot. For example, _0_2011_my vector.hnswc (where 'my vector' is the field name). As a result, BlobStoreIndexShardSnapshot throws an exception complaining file name is not valid.

Solution

Add a validation logic to throw an exception in case provided vector field name has any invalid characters.

private void validateFullFieldName(BuilderContext context) {
    final String fullFieldName = buildFullName(context);
    for (char ch : fullFieldName.toCharArray()) {
        if (Strings.INVALID_FILENAME_CHARS.contains(ch)) {
            throw new IllegalArgumentException(...);
        }
    }
}

public abstract class KNNVectorFieldMapper extends ParametrizedFieldMapper {
    ...
    public static class Builder extends ParametrizedFieldMapper.Builder {
        @Override
        public KNNVectorFieldMapper build(BuilderContext context) {
            validateFullFieldName(context);
            ...

Related Issues

Issue : https://github.com/opensearch-project/k-NN/issues/1859.

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-trigger-bot[bot] commented 3 months ago

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1936-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 f5ba77114ef662e91a8ce26838159f383931912c
# Push it to GitHub
git push --set-upstream origin backport/backport-1936-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1936-to-2.x.