Currently in k-NN plugin all the ITs and BWC created has indices with vector fields and all the documents contain vector field. But in production indices it is not necessary that a k-NN index documents will always have the vector field in it or to say all the vector fields in it. Due to these kind of tests being missing we are not able to catch issues which are fixed in these PRs:
The feature of releasing the memory during closing of the index introduced a bug where if a segment has a knn_vector field but no docs with this field present, then an index OOB exception will be thrown. This was fixed in https://github.com/opensearch-project/k-NN/pull/2182.
Caused by: NotSerializableExceptionWrapper[index_out_of_bounds_exception: Index 0 out of bounds for length 0]
at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
at java.util.Objects.checkIndex(Objects.java:385)
at java.util.ArrayList.get(ArrayList.java:427)
at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesProducer.<init>(KNN80DocValuesProducer.java:78)
at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesFormat.fieldsProducer(KNN80DocValuesFormat.java:44)
at org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:52)
BWC test for all versions where an index has 10 docs where 9 contain vector fields and 1 is no vector field. The ingestion should happen such that document with no vector field gets its own segment. ref proposal section.
Similar to BWC we should have ITs that cover these scenario for an index created similar to step 1
Description
Currently in k-NN plugin all the ITs and BWC created has indices with vector fields and all the documents contain vector field. But in production indices it is not necessary that a k-NN index documents will always have the vector field in it or to say all the vector fields in it. Due to these kind of tests being missing we are not able to catch issues which are fixed in these PRs:
Proposal
To catch the above issues during PRs we should add tests(BWC and ITs) for all 3 engines and disk based vector search. For 1, I added the integration tests with the fix https://github.com/opensearch-project/k-NN/blob/2d1a4080d5b1601bf3362fecd85384348af1f326/src/test/java/org/opensearch/knn/integ/ModeAndCompressionIT.java#L225 . We need to similar thing for BWC and other engines.
Tests to be added
Please suggest more tests if there are any.