Closed keshusharmamrt closed 2 months ago
Hi @keshusharmamrt! Can you please try the Cosine distance and share the dtype and shape of embeddings
?
hi @ashvardanian , Thanks for prompt reply. I tried cosine distance too but it also seems to not work perfectly as I was getting keys via MetricKind.L2sq
. Also using this approach all the distances I am getting are near about zero.
Sharing screenshot for same.
The Details of embeddings
are as follows:
I think the issue might be in the distribution that that produces those embeddings. USearch has no learned quantization, all it does is linear scaling from [-1.0, 1.0] to integers in [-100, 100].
If you know that your values won't properly scale into that range, you may want to somehow normalize them or just directly convert to int8 and later pass to USearch. That should work ๐ค
PS: I'd appreciate a PR for the main README, if you think it makes sense to list it there ๐ค
Hi, Thanks for your suggestion but It doesn't seems to be working too.
Sharing the Example Code snippet which demonstrates the issue(using random uniform embeddings to demonstrate this)
import numpy as np
from usearch.index import Index, MetricKind
np.set_printoptions(formatter={"float": "{: 0.4f}".format})
np.random.seed(42)
num_centroids = 200
num_examples = 100
num_embeddings = num_examples * num_centroids
dim = 1024
centroids = np.random.uniform(low=-1, high=1, size=(num_centroids, dim))
offsets = np.random.uniform(low=-0.05, high=0.05, size=(num_embeddings, dim))
centroids = np.tile(centroids, (num_examples, 1))
embeddings = centroids + offsets
embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, None]
for dtype in ["f32", "i8"]:
print("Running test for dtype = ", dtype)
for metric in [MetricKind.IP, MetricKind.Cosine, MetricKind.L2sq]:
print("Running test ", dtype, metric)
searcher = Index(
ndim=dim,
metric=metric,
dtype=dtype,
connectivity=32,
expansion_add=128,
expansion_search=64,
)
searcher.add(range(len(embeddings)), embeddings, log=True)
matches = searcher.search(embeddings[:5], 4)
print("Keys = \n", matches.keys)
print("Distances = \n", matches.distances)
exact_distances = 1 - (embeddings[:5] @ embeddings.T)
exact_distances.sort(axis=1)
if metric == MetricKind.L2sq:
exact_distances = 2 * exact_distances
print("Exact Distances = \n", exact_distances[:, :4])
I get following observations by running above code:-
Is this expected when quantizing? If one is quantizing the embeddings, shouldn't one prefer the Hamming distance?:
Binarization is to compress a full-precision weight into a binary code that only occupy 1 bit. It is widely used in the embedding-based similarity search of recommender systems [28, 54], since the binary embeddings have two distinct advantages compared to the full-precision ones: (1) less memory or disk cost for storing embeddings; (2) higher inference efficiency as the similarity (i.e., inner product) between binary embeddings can be calculated more efficiently through the Hamming distance, which has been proved in [73]
Reference: Li, S., Guo, H., Tang, X., Tang, R., Hou, L., Li, R., & Zhang, R. (2024). Embedding Compression in Recommender Systems: A Survey. ACM Computing Surveys, 56(5), 1-21.
Expanding on @keshusharmamrt example code snippet:
from sentence_transformers.quantization import quantize_embeddings
embeddings_f32 = embeddings.astype(np.float32)
embeddings_i8 = quantize_embeddings(embeddings_f32, precision="binary")
searcher1 = Index(
ndim=embeddings_f32.shape[1],
metric="hamming",
dtype="i8",
connectivity=32,
expansion_add=128,
expansion_search=64,
)
print("dim: ", embeddings_f32.shape[1])
searcher1.add(range(len(embeddings_f32)), embeddings_f32)
searcher2 = Index(
ndim=embeddings_i8.shape[1],
metric="hamming",
dtype="i8",
connectivity=32,
expansion_add=128,
expansion_search=64,
)
print("dim: ", embeddings_i8.shape[1])
searcher2.add(range(len(embeddings_i8)), embeddings_i8)
query = embeddings_f32[:5]
query_i8 = quantize_embeddings(query, precision="binary")
matches1 = searcher1.search(query, 4)
print("int8 quantization:")
print("Keys = \n", matches1.keys)
print("Distances = \n", matches1.distances)
matches2 = searcher2.search(query_i8, 4)
print("binary quantization:")
print("Keys = \n", matches2.keys)
print("Distances = \n", matches2.distances)
Results:
dim: 1024
dim: 128
int8 quantization:
Keys =
[[ 0 16000 12400 400]
[ 1 11801 14801 4801]
[ 2 2402 7002 19002]
[ 3 1403 1003 8603]
[ 4 17604 6404 14004]]
Distances =
[[ 0.0000 309.0000 329.0000 343.0000]
[ 0.0000 346.0000 377.0000 382.0000]
[ 0.0000 328.0000 332.0000 341.0000]
[ 0.0000 362.0000 366.0000 371.0000]
[ 0.0000 334.0000 340.0000 354.0000]]
binary quantization:
Keys =
[[ 0 18000 1800 16600]
[ 1 17601 10001 18201]
[ 2 3002 8002 12402]
[ 3 7603 13003 16403]
[ 4 8804 12604 2604]]
Distances =
[[ 0.0000 9.0000 10.0000 12.0000]
[ 0.0000 12.0000 12.0000 12.0000]
[ 0.0000 13.0000 14.0000 14.0000]
[ 0.0000 5.0000 7.0000 8.0000]
[ 0.0000 7.0000 8.0000 8.0000]]
Note that each query element is included in the corresponding matches and that $d(x,x) = 0$.
@keshusharmamrt, your points are extremely close to each other to differentiate them with a simple scale-up. You'd need a custom quantization scheme. I'd recommend to stick to f16
. I've cross-validated the output keys with a slightly extended script:
import numpy as np
from usearch.index import Index, MetricKind
np.set_printoptions(formatter={"float": "{: 0.4f}".format})
np.random.seed(42)
num_centroids = 200
num_examples = 100
num_embeddings = num_examples * num_centroids
dim = 1024
centroids = np.random.uniform(low=-1, high=1, size=(num_centroids, dim))
offsets = np.random.uniform(low=-0.05, high=0.05, size=(num_embeddings, dim))
centroids = np.tile(centroids, (num_examples, 1))
embeddings = centroids + offsets
embeddings = embeddings / np.linalg.norm(embeddings, axis=1)[:, None]
for dtype in ["f32", "f16", "i8"]:
print("Running test for dtype = ", dtype)
for metric in [MetricKind.IP, MetricKind.Cosine, MetricKind.L2sq]:
searcher = Index(
ndim=dim,
metric=metric,
dtype=dtype,
connectivity=32,
expansion_add=128,
expansion_search=64,
)
print("Running test ", dtype, metric, searcher.hardware_acceleration)
searcher.add(range(len(embeddings)), embeddings, log=True)
matches = searcher.search(embeddings[:5], 4)
print("Keys = \n", matches.keys)
print("Distances = \n", matches.distances)
exact_distances = 1 - (embeddings[:5] @ embeddings.T)
exact_keys = exact_distances.argsort(axis=1)
exact_distances.sort(axis=1)
if metric == MetricKind.L2sq:
exact_distances = 2 * exact_distances
print("Exact Keys = \n", exact_keys[:, :4])
print("Exact Distances = \n", exact_distances[:, :4])
@adolfogc, sadly the functionality of quantize_embeddings
isn't enough to account for such small perturbations. It would work very poorly.
To quantize float32 embeddings to binary, we simply threshold normalized embeddings at 0: if the value is larger than 0, we make it 1, otherwise we convert it to 0.
I also believe, that you should may have wanted to use a different precision
argument for binary indexes. It's not mentioned on the main page, but is listed in the docs - b1
. Here is the updated snippet:
import numpy as np
from usearch.index import Index, MetricKind
np.set_printoptions(formatter={"float": "{: 0.4f}".format})
np.random.seed(42)
num_examples = 100
num_embeddings = num_examples * num_centroids
dim = 1024
embeddings = np.random.uniform(low=-1, high=1, size=(num_embeddings, dim))
from sentence_transformers.quantization import quantize_embeddings
embeddings_f32 = embeddings.astype(np.float32)
embeddings_b1 = quantize_embeddings(embeddings_f32, precision="binary").astype(np.uint8)
searcher_f32 = Index(
ndim=dim,
metric="cos",
dtype="f32",
)
print("dim: ", embeddings_f32.shape[1])
searcher_f32.add(range(len(embeddings_f32)), embeddings_f32)
searcher_b1 = Index(
ndim=dim,
metric="hamming",
dtype="b1",
)
print("dim: ", embeddings_b1.shape[1])
searcher_b1.add(range(len(embeddings_b1)), embeddings_b1)
queries_f32 = embeddings_f32[:5]
queries_b1 = embeddings_b1[:5]
matches_f32 = searcher_f32.search(queries_f32, 4)
print("Original:")
print("Keys = \n", matches_f32.keys)
print("Distances = \n", matches_f32.distances)
matches_b1 = searcher_b1.search(queries_b1, 4)
print("Binary:")
print("Keys = \n", matches_b1.keys)
print("Distances = \n", matches_b1.distances)
I think I should update the main README snippet. Will mark you as a co-author ๐ค
I also believe, that you should may have wanted to use a different
precision
argument for binary indexes. It's not mentioned on the main page, but is listed in the docs -b1
. Here is the updated snippet:
@ashvardanian thank you for pointing that out! In that case, I think the snippet could use the ubinary
precision provided by sentence_transformers
:
quantize_embeddings(embeddings_f32, precision="ubinary")
Defined like so: https://github.com/UKPLab/sentence-transformers/blob/de3b29890344139d7b0e582cc7e00f18b98dc3d3/sentence_transformers/quantization.py#L380
I thought that since the int8
dimension was "lower", the saved index would be smaller in size, but it turned out that using b1
also compressed the serialized index.
Also, perhaps this (using b1
and ubinary
) could be suggested as an improvement to the semantic_search_usearch
function of sentence_transformers
, here:
What do you think?
BTW, superb library, thank you for making it available!
Hey @ashvardanian , @keshusharmamrt has prepared very simple example which shows that for the same data when using different metrics and int8 quantization we are getting unexpected results. It looks that there is a bug in the int8 quantization (maybe only for python client). You can see it that for dot
metric the order of keys is random, but for l2sq
is fine, but values of distances are too large. Everything works as expected for f16 and f32 quantization.
We also compared usearch with faiss and qdrant for the same HNSW parameters and dot
metric of normalized vectors.
Connectivity | Expansion Add | Expansion Search |
---|---|---|
32 | 128 | 64 |
Our results for 1M of embeddings of size 1024
Method | Recal@1 |
---|---|
usearch f16 | 0.98672 |
usearch int8 | 0.91227 |
qdrant int8 | 0.96034 |
qdrant int8 (with rescore) | 0.98555 |
faiss int8 | 0.98278 |
For faiss we use the index setup from usearch benchmark repository
As you can see the results for usearch int8 are far from faiss and qdrant equivalents.
What do you think?
@adolfogc, sure, that makes a lot of sense! I've proposed the patch.
@kmkolasinski and @keshusharmamrt thank you for benchmarks and comparisons, those are very helpful! Can I also ask you to report the index.hardware_acceleration
output, the USearch version, as well as the OS and the CPU model? Depending on those parameters, I'll know if one of the following kernels from SimSIMD is used:
On older versions I've made this mistake.
// Unfortunately we can't use the `_mm512_dpbusd_epi32` intrinsics here either,
// as it's assymetric with respect to the sign of the input arguments:
// Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
// So we have to use the `_mm512_dpwssd_epi32` intrinsics instead, upcasting
// to 16-bit beforehand.
ab_i32s_vec = _mm512_dpwssd_epi32(ab_i32s_vec, a_vec, b_vec);
a2_i32s_vec = _mm512_dpwssd_epi32(a2_i32s_vec, a_vec, a_vec);
b2_i32s_vec = _mm512_dpwssd_epi32(b2_i32s_vec, b_vec, b_vec);
hi, @ashvardanian ,Thanks for your support.
Sharing the Details as you asked.
index.hardware_acceleration
Output:-
Usearch Version = 2.12.0
OS =Ubuntu 22.04.4 LTS
CPU Model:- 13th Gen Intelยฎ Coreโข i5-13600HX ร 20
That means, this kernel is used. I've just tried implementing a few tests for it on random data, but can't yet see a noticeable difference in distance calculations between that and serial code. Any chance you have an example of two arrays for which you get a big error, calling:
from simsimd import cos
cos(a, b)
Hey, we will check it later. I don't know how it is later used in usearch, but we see the problem with dot
distance, not cos. However I don't see dot implementation there. The reason we use dot is that our vectors are already normalized, so cos and dot should provide same results.
On integer representations like i8
the conventional notion of normalization won't apply, as there are no values between 0 and 1. So anything that computes a dot-product of integer arrays would still need to renormalize them. USearch will swap ip
with cos
under the hood.
hi, I also Tried to use cos
for bench marking using usearch with f16 and i8 (here too I see with i8
performance is bad)
HNSW config:
Connectivity | Expansion Add | Expansion Search |
---|---|---|
32 | 128 | 64 |
Sharing Entire Comparison we did for 1M of embeddings of size 1024
Method | Recall@1 |
---|---|
usearch f16(ip) | 0.98672 |
usearch i8(l2sq) | 0.91227 |
usearch f16(cos) | 0.98244 |
usearch i8(cos) | 0.76728 |
qdrant int8 | 0.96034 |
qdrant int8 (with rescore) | 0.98555 |
faiss int8 | 0.98278 |
Clearly with i8
the Usearch Metrics are poor only l2sq
gives some reasonable one which too are lesser than other approaches
Hi, @ashvardanian if int8 requires some special treatment from the user side, could you provide us example on how do it or maybe there are some docs for it ? All libraries we have tried so far do the quantization under the hood, so things like bucketing, scaling etc is handled internally by the lib.
using other words, I wonder what we can do on our side to get similar results as other frameworks for int8 ? because there are two options, either we are doing something wrong or there is indeed some problem with int8 quantization.
hi @ashvardanian, do you have any update regarding this?
A bit overwhelmed by other USearch-related projects, and havenโt had the chance to investigate deeper, @keshusharmamrt ๐
All libraries we have tried so far do the quantization under the hood, so things like bucketing, scaling etc is handled internally by the lib.
Itโs true, most other libraries come with a lot of options for different quantization schemes. We donโt currently aim to enforce any of them or implement them in the C++ layer. But maybe we can add a couple of simple quantization schemes as utilities in the Python layer for now, Any chance you can contribute, @kmkolasinski?
Hi, I can't promise it. The results are weird and at this moment it looks like there is some bug.
We can check for bugs in the SimSIMD kernels, but they may already be sufficiently covered with tests.
ok, last question: let's say I would like to use int8 quantization and dot product metric. Are there some requirements for vectors to make it work properly ? do you have some example code which we could use and benchmark on our data for such scenario ?
Sure, the most obvious benchmark/example would be to generate random integers (instead of quantizing random floats), constructing the index, and measuring self-recall. Thats should work fine.
Hi @ashvardanian @kimihailv we tried last thing. We took the benchmarking script from https://github.com/unum-cloud/usearch-benchmarks/tree/main and we tried to compare the metrics on a smaller dataset which is base.10M.fbin
. Here is what we got:
We though we can use groundtruth.10K.ibin
to verify metrics together with query.10K.fbin
and base.10M.fbin
, but it seems groundtruth.10K.ibin
are not compatible with these query and train vectors.
The repository only points to 1B dataset which is too large for us to process. Unfortunately, we couldn't manage to find 100M data which are mentioned in the usearch benchmarking blog post.
The vectors in query.10K.fbin
are different from the official ones query.public.10K.fbin
from the https://research.yandex.com/blog/benchmarks-for-billion-scale-similarity-search. Which means that query vectors has been selected, which raise various questions in our minds. For example, how we can reproduce this blog post (https://www.unum.cloud/blog/2023-11-07-scaling-vector-search-with-intel) metrics ?
Also, there is a bug in the recall
computation in the benchmarking code. In the run.py file groundtruth
vector is of shape [10_000, 1] (https://github.com/unum-cloud/usearch-benchmarks/blob/main/run.py#L49)
groundtruth = load_matrix(groundtruth_path)[:, :1]
later you use these groundtruth
indices to compute recall https://github.com/unum-cloud/usearch-benchmarks/blob/main/run.py#L82
recall_at_one = recall(nn_ids, groundtruth, ats=(1,))
but this should be replaced with
recall_at_one = recall(nn_ids, groundtruth[:, 0], ats=(1,))
or recall function should be fixed, as it adds new axis to the tensor causing a weird behaviour. Small example which demonstrates the issue
We tried to reproduce the results using base.10M.fbin
vectors, but we are getting the same conclusions as when using our data. Integer 8 quantization is about 10% worse than f16. Here is the code you can use to verify it by yourself:
from tqdm import tqdm
import numpy as np
from usearch.index import Index
from usearch.io import load_matrix
from utils.metrics import recall
train_embeddings = load_matrix("data/base.10M.fbin")[:500_000] query_embeddings = load_matrix("data/query.10K.fbin")
def best_match(query): return (train_embeddings @ query[:, None]).ravel().argmax()
groundtruths = [] for query in tqdm(query_embeddings, desc="Computing ground truths"): best_index = best_match(query) groundtruths.append(best_index)
groundtruths = np.array(groundtruths)
print("Benchmark f16 index") i_half = Index(ndim=96, metric="cos", dtype="f16") i_half.add(None, train_embeddings, log=True) result = i_half.search(query_embeddings, 1, log=True) recall_at_one = recall(result.keys, groundtruths, ats=(1,)) print(f" >> recall@1 = {recall_at_one[0]:.5f}")
print("Benchmark i8 index") i_quarter = Index(ndim=96, metric="cos", dtype="i8") i_quarter.add(None, train_embeddings, log=True) result = i_quarter.search(query_embeddings, 1, log=True) recall_at_one = recall(result.keys, groundtruths, ats=(1,)) print(f" >> recall@1 = {recall_at_one[0]:.5f}")
After running this code you will see
Benchmark f16 index Add: 100%|โโโโโโโโโโ| 500000/500000 [00:17<00:00, 28269.71vector/s] Search: 100%|โโโโโโโโโโ| 10000/10000 [00:00<00:00, 33128.90vector/s]
recall@1 = 0.97220 Benchmark i8 index Add: 100%|โโโโโโโโโโ| 500000/500000 [00:11<00:00, 42828.76vector/s] Search: 100%|โโโโโโโโโโ| 10000/10000 [00:00<00:00, 51415.34vector/s] recall@1 = 0.86640
I think that this is a clear evidence that there is a bug in the int8 index, and we cannot simply solve it by adding some customer wrapper for index class. Hope this helps somehow.
@ashvardanian final remarks and conclusions after spending more time on this issue:
Indeed scaling vectors values to range -1 and 1 improves recall for the test vectors of dim=96. Something which bothers me is that with this approach we are loosing some potential information as int8 range is (-128, 127) but usearch only uses (-100, 100), with this truncation we drop about 20% of potential information while bucketing. This 20% maybe important in some cases.
I think it would be nice if this -1, 1 scaling should be explicitly mentioned in the docs or readme, maybe usearch could do some validation of the input vectors to warn the user, that the vectors are not scaled optimally.
There is also one potential problem with int8 quantization, which can be suprising for unaware users i.e. if vectors values are larger than +/-1 quantization will create overflow issues. Here is an example: you can imagine that the value flip can introduce some dramatic changes in the distance values.
It looks like there is indeed a bug in the "Inner Product" metric, as according to documentation and the code IP should compute 1 - dot(a, b)
, but when testing it locally it returns different values, here is an example
and different results for f16
From this you can see that IP is actually computing dot(a, b) / ||a|| ||b||
. This also explains why IP metric is producing terrible recall as the code expects the distance not similarity measure. However, I don't see where it is defined in the code, I found this in the code: https://github.com/unum-cloud/usearch/blob/5ea48c87c56a25ab57634a8f207f80ae675ed58e/include/usearch/index_plugins.hpp#L977 which seems to be correct. I use usearch version 2.12.0
.
@keshusharmamrt @adolfogc @kmkolasinski, I believe this issue is resolved now. I've extended the docs to highlight the improved behavior. Let me know if the issue persists ๐ค
okey sure and thanks let's close it, in the case of problems we will create new ticket :+1:
Describe the bug
Hi, I am using uSearch with Integer 8 index dtype. I am having 928K embedding which I try to add to both i8 and float16 index with Inner Product Metrics and then search 10 starting embedding to get top 10 matches. I see with float16 I get same vector getting recognised in top 10 but with I8 I didn't get them. Also when I change metrics from MetricKind.Inner Product to MetricKind.L2sq those vectors starts coming but using this the distances are getting scaled differently(distances for IP are of order 0.61365813 and for l2sq 4517 maybe these are also because of i8) which I am not able to understand. Also adding embedding to MetricKind.InnerProduct with I8 dtype is too slow.
Steps to reproduce
I use following code to create and populate index with I8 dtype:-
I get following results:
when I only change
metric=MetricKind.L2sq
I start getting the expected embeddings (same index as first key). Note:- With other dtype like f16,f32 MetricKind.InnerProduct works completely fine.Expected behavior
Ideally I should get something like this with I8+ MetricKind.InnerProduct The following is result I get with MetricKind.L2sq:
USearch version
2.11.7
Operating System
Ubuntu 22.04
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
keshusharmamrt@gmail.com
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct