opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

[Backport 2.x] Remove FileWatcher from KNN (#2182) #2225

Closed 0ctopus13prime closed 1 month ago

0ctopus13prime commented 1 month ago

Signed-off-by: Dooyong Kim kdooyong@amazon.com

backporting from https://github.com/opensearch-project/k-NN/pull/2182

0ctopus13prime commented 1 month ago

The last test failed because in Linux, it creates two segments while we are expecting three files will be created in a single segment. Will update the commit after fixed it to always generate one single segment, so that file name querying will match the number of vector fields.

In testNativeEngineVectorFormat_whenMultipleVectorFieldIndexed_thenSuccess

...

IndexSearcher searcher = new IndexSearcher(indexReader);
final LeafReader leafReader = searcher.getLeafContexts().get(0).reader();
SegmentReader segmentReader = Lucene.segmentReader(leafReader);
final List<String> hnswfiles = getFilesFromSegment(dir, FAISS_ENGINE_FILE_EXT);
assertEquals(3, hnswfiles.size()); <--------- THIS!! Unlike other platform, Linux create two segments.

In Mac, created files in segment 0 : [_0_165_byte_field.faiss, _0_165_float_binary_field.faiss, _0_165_float_field.faiss]
In Linux, two segments were created (e.g. 0 + 1) : [_0_165_byte_field.faiss, _0_165_float_binary_field.faiss, _0_165_float_field.faiss, _1_165_float_binary_field]

Therefore, the test is not failing because of the change. Will fix the test.