milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.76k stars 2.85k forks source link

[Bug]: sparse index's Train #34753

Open ldak4747 opened 2 months ago

ldak4747 commented 2 months ago

Is there an existing issue for this?

Environment

https://github.com/zilliztech/knowhere.git,main分支

Current Behavior

在InvertedIndex::Train中,首先算出全部样本的,非零维的计数总和amount,然后"std::vector vals(amount);"构造vals数组,数组容量是amount,也就是数组前amount成员值都是0。 然后将全部样本的非零维的值,追加到数组vals,也就是vals长度为2*amount,前amount为0,后amount为非0值。 再通过std::nth_element寻找vals的全部成员中,第drop_ratio_build小的,这个很容易就是0。 请问这是符合预期的么?

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

yanliang567 commented 2 months ago

/assign @liliu-z please give a hand on this /unassign

ldak4747 commented 2 months ago

您这边还没有反馈~

zhengbuqian commented 2 months ago

Hi, this is indeed not the expected behavior. The intention was std::vector vals; vals.reserve(amount). The current impl will cause less than drop_ratio_build ratio small values being dropped, leading to an increased accuracy with lower performance.

Thanks for the feedback, I'll put up a fix soon https://github.com/zilliztech/knowhere/issues/720

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

yanliang567 commented 1 month ago

@zhengbuqian was this fix in mivlus 2.4.9?

stale[bot] commented 5 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.