opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[BUG] Fail to upgrade to OpenSearch 2.17 due to KNN field name validation #2219

Closed knx1029 closed 1 month ago

knx1029 commented 1 month ago

What is the bug? We have an OpenSearch index, where the KNN Vector name is "Text Chunk_vector". Our attempt to upgrade to 2.17 fails with the following error. It seems to be caused by https://github.com/opensearch-project/k-NN/commit/f5ba77114ef662e91a8ce26838159f383931912c.

[2024-10-17T03:41:40,343][ERROR][o.o.s.l.BuiltinLogTypeLoader] [os-b67de0b3-nodes-0] Failed loading builtin log types from disk!
java.nio.file.FileSystemNotFoundException: null
    at jdk.zipfs@21.0.4/jdk.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:156) ~[?:?]
    at jdk.zipfs@21.0.4/jdk.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:142) ~[?:?]
    at java.base/java.nio.file.Path.of(Path.java:209) ~[?:?]
...
[2024-10-17T03:41:40,544][INFO ][o.o.t.TransportService   ] [os-b67de0b3-nodes-0] publish_address {os-b67de0b3-nodes-0/10.128.5.240:9300}, bound_addresses {[::]:9300}
[2024-10-17T03:41:40,785][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [os-b67de0b3-nodes-0] uncaught exception in thread [main]
org.opensearch.bootstrap.StartupException: java.lang.IllegalStateException: unable to upgrade the mappings for the index [[index-44109978/DUah5H_ZSWGpv1c0u3WWwA]]
    at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:185) ~[opensearch-2.17.0.jar:2.17.0]
    at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:172) ~[opensearch-2.17.0.jar:2.17.0]
...
Caused by: java.lang.IllegalStateException: unable to upgrade the mappings for the index [[index-44109978/DUah5H_ZSWGpv1c0u3WWwA]]
    at org.opensearch.cluster.metadata.MetadataIndexUpgradeService.checkMappingsCompatibility(MetadataIndexUpgradeService.java:252) ~[opensearch-2.17.0.jar:2.17.0]
    at org.opensearch.cluster.metadata.MetadataIndexUpgradeService.upgradeIndexMetadata(MetadataIndexUpgradeService.java:121) ~[opensearch-2.17.0.jar:2.17.0]
...
Caused by: org.opensearch.index.mapper.MapperParsingException: Failed to parse mapping [_doc]: Vector field name must not include invalid characters of [' ', '"', '*', '\', '<', '|', ',', '>', '/', '?']. Provided field name=[Text Chunk_vector] had a disallowed character [ ]
    at org.opensearch.index.mapper.MapperService.internalMerge(MapperService.java:479) ~[opensearch-2.17.0.jar:2.17.0]
    at org.opensearch.index.mapper.MapperService.internalMerge(MapperService.java:465) ~[opensearch-2.17.0.jar:2.17.0]
    at 
Caused by: java.lang.IllegalArgumentException: Vector field name must not include invalid characters of [' ', '"', '*', '\', '<', '|', ',', '>', '/', '?']. Provided field name=[Text Chunk_vector] had a disallowed character [ ]
    at org.opensearch.knn.index.mapper.KNNVectorFieldMapper$Builder.validateFullFieldName(KNNVectorFieldMapper.java:320) ~[?:?]
    at org.opensearch.knn.index.mapper.KNNVectorFieldMapper$Builder.build(KNNVectorFieldMapper.java:228) ~[?:?]
...
java.lang.IllegalStateException: unable to upgrade the mappings for the index [[index-44109978/DUah5H_ZSWGpv1c0u3WWwA]]
Likely root cause: java.lang.IllegalArgumentException: Vector field name must not include invalid characters of [' ', '"', '*', '\', '<', '|', ',', '>', '/', '?']. Provided field name=[Text Chunk_vector] had a disallowed character [ ]
    at org.opensearch.knn.index.mapper.KNNVectorFieldMapper$Builder.validateFullFieldName(KNNVectorFieldMapper.java:320)
    at org.opensearch.knn.index.mapper.KNNVectorFieldMapper$Builder.build(KNNVectorFieldMapper.java:228)

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Use an older OpenSearch version (e.g., 2.13.0), create an index, where the KNN Vector field name is "Text Chunk_vector"
  2. Upgrade the OpenSearch cluster to 2.17.0 and upgrade will fail

What is the expected behavior? Upgrade the version should succeed.

What is your host/environment?

Do you have any screenshots? N/A

Do you have any additional context? It seems to be caused by https://github.com/opensearch-project/k-NN/commit/f5ba77114ef662e91a8ce26838159f383931912c?

heemin32 commented 1 month ago

@knx1029, You are right. We restricted few characters in knn field name starting 2.17.0. The reason is because those character in the field name prevents subsequence snapshot to be taken. https://github.com/opensearch-project/k-NN/issues/1859 As a workaround, would you be able to reindex using different field name?

knx1029 commented 1 month ago

OK. Thanks!