milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.05k stars 2.88k forks source link

[Enhancement]: about sparse embedding, may upper_bound is calc with max_value * query_value on this dim, instead only max_value on this dim? #36711

Open ldak4747 opened 1 week ago

ldak4747 commented 1 week ago

Is there an existing issue for this?

What would you like to be added?

in src/index/sparse/sparse_inverted_index.h InvertedIndex::search_wand, when calc the upper_bound by traverse cursors, codes as "upper_bound += cursors[pivot]->max_score();", may the codes could be "upper_bound += (cursors[pivot]->max_score() * query_value);" (pseudocode)?

that maybe query value on one dim is very large, so upper_bound may not precise perfectly?

Why is this needed?

No response

Anything else?

No response

ldak4747 commented 1 week ago

for example, index_value on one dim is from 0 to 1, but query_value on one dim is 1e10, so the inner product maybe very huge on this dim, the distance between 0.11e10 and 0.91e10 maybe very huge, but distance between 0.1 and 0.9 maybe no influence to prevent found_pivot = false

zhengbuqian commented 1 week ago

Hi @ldak4747, I didn't get your point. Could you elaborate further?

query_value is multiplied onto max_score when the cursor is created.