Closed paul7Junior closed 6 months ago
Man this is incredible, what a deep dive. This part of the code is probably most familiar to @santhnm2 , Keshav let me know if I should do the review
LGTM, @okhat if you want to just do a quick check to make sure it works then we can merge
This PR addresses the bug "Duplicate search results when k is a high value #270"
Based on what I can observe, the issue arises when
global_approx_scores
is empty, but nfiltered_docs still expects more pids than available.Original Issue: in filter_pids_helper function in filter_pids.cpp file
Proposed solution:
For the final run to filter pids, there is no condition to stop if
global_approx_scores
priority_queue is empty. If in the previous filtering operations the # of pids is less thannfiltered_docs
then you'd keep looking for pids up tonfiltered_docs
even tho there isnt enough pids to fill your final result.Also consequence of this is you don't know beforehand the final size of your
final_filtered_pids
, we only know its maximumnfiltered_docs
.To address that, I removed the fixed sized assigned array and used vectors that are returned from
filter_pids_helper
function, its a bit more functional style programming, so eventually safer as well.I welcome any feedback.