⬆️ Bump sentence-transformers from 2.5.1 to 2.6.0

Bumps sentence-transformers from 2.5.1 to 2.6.0.

Release notes

Sourced from sentence-transformers's releases.

v2.6.0 - Embedding Quantization, GISTEmbedLoss

This release brings embedding quantization: a way to heavily speed up retrieval & other tasks, and a new powerful loss function: GISTEmbedLoss.

Install this version with
pip install sentence-transformers==2.6.0
Embedding Quantization

Embeddings may be challenging to scale up, which leads to expensive solutions and high latencies. However, there is a new approach to counter this problem; it entails reducing the size of each of the individual values in the embedding: Quantization. Experiments on quantization have shown that we can maintain a large amount of performance while significantly speeding up computation and saving on memory, storage, and costs.

To be specific, using binary quantization may result in retaining 96% of the retrieval performance, while speeding up retrieval by 25x and saving on memory & disk space with 32x. Do not underestimate this approach! Read more about Embedding Quantization in our extensive blogpost.

Binary and Scalar Quantization

Two forms of quantization exist at this time: binary and scalar (int8). These quantize embedding values from float32 into binary and int8, respectively. For Binary quantization, you can use the following snippet:
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
1. Load an embedding model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
2a. Encode some text using "binary" quantization
binary_embeddings = model.encode(
["I am driving to the lake.", "It is a beautiful day."],
precision="binary",
)
2b. or, encode some text without quantization & apply quantization afterwards
embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."])
binary_embeddings = quantize_embeddings(embeddings, precision="binary")
References:

SentenceTransformer.encode

quantize_embeddings

GISTEmbedLoss

GISTEmbedLoss, as introduced in Solatorio (2024), is a guided variant of the more standard in-batch negatives (MultipleNegativesRankingLoss) loss. Both loss functions are provided with a list of (anchor, positive) pairs, but while MultipleNegativesRankingLoss uses anchor_i and positive_i as positive pair and all positive_j with i != j as negative pairs, GISTEmbedLoss uses a second model to guide the in-batch negative sample selection.

This can be very useful, because it is plausible that anchor_i and positive_j are actually quite semantically similar. In this case, GISTEmbedLoss would not consider them a negative pair, while MultipleNegativesRankingLoss would. When finetuning MPNet-base on the AllNLI dataset, these are the Spearman correlation based on cosine similarity using the STS Benchmark dev set (higher is better):

The blue line is MultipleNegativesRankingLoss, whereas the grey line is GISTEmbedLoss with the small all-MiniLM-L6-v2 as the guide model. Note that all-MiniLM-L6-v2 by itself does not reach 88 Spearman correlation on this dataset, so this is really the effect of two models (mpnet-base and all-MiniLM-L6-v2) reaching a performance that they could not reach separately.

All changes

Add GISTEmbedLoss by @avsolatorio in UKPLab/sentence-transformers#2535

[feat] Add 'get_config_dict' method to GISTEmbedLoss for better model cards by @tomaarsen in UKPLab/sentence-transformers#2543

... (truncated)

Commits

a5f7749 Release v2.6.0
13a9f3f [feat] Add binary & scalar embedding quantization support to Sentence Trans...
e6af66f Also update return docstring of encode_multi_process (#2548)
caaa28d Fix SentenceTransformer encode documentation return type default (numpy vecto...
87f4180 [deprecation] Deprecate save_to_hub in favor of push_to_hub; add safe_s...
fc2a2d8 Enable saving modules as pytorch_model.bin (#2542)
b9255d9 Add 'get_config_dict' method to GISTEmbedLoss for better model cards (#2543)
465d4f0 Add GISTEmbedLoss (#2535)
See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

qnguyen3 / chat-with-mlx

⬆️ Bump sentence-transformers from 2.5.1 to 2.6.0 #75

v2.6.0 - Embedding Quantization, GISTEmbedLoss

Embedding Quantization

Binary and Scalar Quantization

1. Load an embedding model

2a. Encode some text using "binary" quantization

2b. or, encode some text without quantization & apply quantization afterwards

GISTEmbedLoss

All changes