opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Refactor scoring to map leaf reader context with results #2271

Open VijayanB opened 1 week ago

VijayanB commented 1 week ago

Description

We map order of results to order of segments, and finally rely on that order to build top docs. Refactor method to use map.Entry to map leaf reader context with results from those leaves. This is required when we want to split segments based on approx search or exact search to reduce rescoring twice by exact search

Related Issues

Prerequisite for #2215

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

heemin32 commented 1 week ago

Do you have a follow up code showing why this is needed? Can't we always get the relation between leaf reader and the result based on their index in the list?

VijayanB commented 1 week ago

Do you have a follow up code showing why this is needed? Can't we always get the relation between leaf reader and the result based on their index in the list?

Sure. https://github.com/opensearch-project/k-NN/pull/2273 . This is draft PR. I want to test recall before making it ready.

heemin32 commented 1 week ago

Thanks. I think this refactoring should be accompanied with the final PR instead of pushing it separately.

VijayanB commented 1 week ago

Thanks. I think this refactoring should be accompanied with the final PR instead of pushing it separately.

Having multiple independent PR which doesn’t break existing feature helps to get reviewed faster. If I add refactoring and new implementation, it might be hard to find out bugs

heemin32 commented 1 week ago

The challenge is that it's difficult to justify this refactoring without the actual PR that requires it. There's always a possibility that the implementation in the subsequent PR may change, in which case this refactoring might end up providing no real benefit.

VijayanB commented 1 week ago

The challenge is that it's difficult to justify this refactoring without the actual PR that requires it. There's always a possibility that the implementation in the subsequent PR may change, in which case this refactoring might end up providing no real benefit.

Fair enough. How about this? I will backport to 2.x only if both PRs are merged. Will that work?

heemin32 commented 1 week ago

I am against merging it in main. Doesn't matter we backport it to 2.x or not. The change is not big so I think it should be okay to have this with following PR together.

VijayanB commented 1 week ago

I am against merging it in main. Doesn't matter we backport it to 2.x or not. The change is not big so I think it should be okay to have this with following PR together.

I don't see a reason why it is absolute necessary that this feature cannot be broken into two PR with predefined scope, where each does two different thing. Can you give reason why you are against breaking into two PRs by pointing out how it breaks either build or existing feature?

heemin32 commented 1 week ago

I am against merging it in main. Doesn't matter we backport it to 2.x or not. The change is not big so I think it should be okay to have this with following PR together.

I don't see a reason why it is absolute necessary that this feature cannot be broken into two PR with predefined scope, where each does two different thing. Can you give reason why you are against breaking into two PRs by pointing out how it breaks either build or existing feature?

Thank you for your effort in keeping the PR small to make the review process easier—I really appreciate that. My concern is that the value of this PR depends heavily on the approval of the subsequent PR, as it primarily supports those changes. If the following PR undergoes modifications, this one may also need to be adjusted accordingly. By the way, I didn't say it will breaks either build or existing feature.