v6d-io / v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
https://v6d.io
Apache License 2.0
818 stars 117 forks source link

Add the hash function for batched tokens in llm kv cache. #1836

Closed dashanji closed 3 months ago

dashanji commented 3 months ago

What do these changes do?

I have tested the hash functions MurmurHash3 and CityHash with 100000 different seqs(each seq contains 10 tokens) for multiple times, and the result is as follows.

MurmurHash3Algorithm conflict count is 7 / 1600000
CityHashAlgorithm conflict count is 4 / 1600000

or

MurmurHash3Algorithm conflict count is 1 / 1600000
CityHashAlgorithm conflict count is 3 / 1600000

Basically, the conflict rate of the two hash functions is same.

Related issue number

Part of https://github.com/v6d-io/v6d/issues/1832