harvard-lil / warc-gpt

WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
https://lil.law.harvard.edu/blog/2024/02/12/warc-gpt-an-open-source-tool-for-exploring-web-archives-with-ai/
MIT License
218 stars 20 forks source link

Bump sentence-transformers from 3.0.1 to 3.1.1 #128

Closed dependabot[bot] closed 5 days ago

dependabot[bot] commented 1 week ago

Bumps sentence-transformers from 3.0.1 to 3.1.1.

Release notes

Sourced from sentence-transformers's releases.

v3.1.1 - Patch hard negative mining & remove numpy<2 restriction

This patch release fixes hard negatives mining for models that don't automatically normalize their embeddings and it lifts the numpy<2 restriction that was previously required.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.1.1

Inference only:

pip install sentence-transformers==3.1.1

Hard Negatives Mining Patch (#2944)

The mine_hard_negatives utility introduced in the previous release would fail if use_faiss=True & the model does not automatically normalize its embeddings. This release patches that, allowing the utility to work with all Sentence Transformer models:

from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

Load a Sentence Transformer model

model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1").bfloat16()

Load a dataset to mine hard negatives from

dataset = load_dataset("sentence-transformers/natural-questions", split="train[:10000]") print(dataset) """ Dataset({ features: ['query', 'answer'], num_rows: 10000 }) """

Mine hard negatives

dataset = mine_hard_negatives( dataset=dataset, model=model, range_min=10, range_max=50, max_score=0.8, margin=0.1, num_negatives=5, sampling_strategy="random", batch_size=128, use_faiss=True, ) ''' Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:21<00:00, 3.51it/s] Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [00:03<00:00, 25.77it/s] Querying FAISS index: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3.98it/s] Metric Positive Negative Difference </tr></table>

... (truncated)

Commits
  • 73c8dc3 Merge branch 'master' into v3.1-release; Release v3.1.1
  • 7290448 [fix] Ensure that the embeddings from hard negative mining are normalized (...
  • a201c6d [metadata] Extend pyproject.toml metadata (#2943)
  • dafe2b6 [deps] Attempt to remove numpy restrictions (#2937)
  • d6e34ee Increment dev version to v3.2.0.dev0
  • 845dd54 Release v3.1.0
  • a3f2236 [feat] Update mine_hard_negatives to using a full corpus and multiple posit...
  • 8af7c5d [feat] Add column order warnings to the data collator (#2928)
  • bc9a666 [docs] Move losses up in the package reference; they're more important (#2929)
  • 597d5ed [fix] Add dtype cast for modules other than Transformer (#2889)
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)