tensorchord / pgvecto.rs

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.
https://docs.pgvecto.rs/getting-started/overview.html
Apache License 2.0
1.75k stars 71 forks source link

feat: add metrics dot and cos #566

Closed cutecutecat closed 2 months ago

cutecutecat commented 2 months ago

Error bound Check

Check the probability that the inequality does not hold: $est - err 2.9 < real < est + err 2.9$

Projection With Identity Matrix

vector + dim metric errors(attempt 1) errors(attempt 2) errors(attempt 3)
glove + 100 dot 19541 = 2.0% 29535 = 2.9% 19099 = 2.0%
glove + 100 cos 19656 = 2.0% 27026 = 2.7% 23352 = 2.3%
sift + 128 dot 100241 = 10.0% 104076 = 10.4% 97756 = 9.8%
sift + 128 cos 77400 = 7.7% 74132 = 7.4% 77400 = 7.8%
sift + 128 l2 104613 = 10.5% 88878 = 8.9% 135172 = 13.5%

Projection With Random Matrix

vector + dim metric errors(attempt 1) errors(attempt 2) errors(attempt 3)
glove + 100 dot 26702 = 2.7% 64816 = 6.5% 42559 = 4.3%
glove + 100 cos 25517 = 2.6% 11341 = 1.1% 54328 = 5.4%
sift + 128 dot 64626 = 6.5% 49139 = 4.9% 103020 = 10.3%
sift + 128 cos 88658 = 8.9% 92813 = 9.2% 89247 = 9.1%
sift + 128 l2 66545 = 6.7% 78286 = 7.8% 86253 = 8.6%

Benchmark

CPP

and

Main

and

With this PR

and

(Updated 9/13)With this PR

patch: no-residual

tag top QPS recall
Glove-200-l2-main 100 274 0.9002
Glove-200-l2-main-patch 100 331 0.9010
Glove-200-l2-PR 100 284 0.9019
Glove-200-l2-PR-patch 100 344 0.9010
Glove-200-cos-PR 100 285 0.9014
Glove-200-cos-PR-patch 100 342 0.9012
cutecutecat commented 2 months ago

Implement

usamoi commented 2 months ago

Please remove code that is used for dot distance + residual vector quantization.

usamoi commented 2 months ago

Why there is centroids_squares? If it's meaningless, remove it.

usamoi commented 2 months ago

Please resolve conflicts.