MTEB: Massive Text Embedding Benchmark

Related issues

455: Embeddings - OpenAI API

### Details

Similarity score: 0.86 - [ ] [Embeddings - OpenAI API](https://platform.openai.com/docs/guides/embeddings) Embeddings ============ Learn how to turn text into numbers, unlocking use cases like search. New embedding models -------------------- Our newest and most performant embedding models, `text-embedding-3-small` and `text-embedding-3-large`, are now available. These models offer lower costs, higher multilingual performance, and new parameters to control the overall size. #### Suggested labels #### null

552: LargeWorldModel/LWM-Text-Chat-1M · Hugging Face

### Details

Similarity score: 0.86 - [ ] [LargeWorldModel/LWM-Text-Chat-1M · Hugging Face](https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M) # LargeWorldModel/LWM-Text-Chat-1M · Hugging Face **DESCRIPTION:** LWM-Text-1M-Chat Model Card **Model details** Model type: LWM-Text-1M-Chat is an open-source model trained from LLaMA-2 on a subset of Books3 filtered data. It is an auto-regressive language model, based on the transformer architecture. Model date: LWM-Text-1M-Chat was trained in December 2023. Paper or resources for more information: [https://largeworldmodel.github.io/](https://largeworldmodel.github.io/) **URL:** [https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M](https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M) #### Suggested labels #### {'label-name': 'Open-source Models', 'label-description': 'Models that are publicly available and open-source for usage and exploration.', 'gh-repo': 'huggingfaceco/LargeWorldModel/LWM-Text-Chat-1M', 'confidence': 56.11}

625: unsloth/README.md at main · unslothai/unsloth

### Details

Similarity score: 0.86 - [ ] [unsloth/README.md at main · unslothai/unsloth](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1) # unsloth/README.md at main · unslothai/unsloth

### Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory! ![](https://i.ibb.co/sJ7RhGG/image-41.png)

## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Gemma 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing) | 2.4x faster | 58% less | | **Mistral 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) | 2.2x faster | 62% less | | **Llama-2 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing) | 2.2x faster | 43% less | | **TinyLlama** | [▶️ Start on Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing) | 3.9x faster | 74% less | | **CodeLlama 34b** A100 | [▶️ Start on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing) | 1.9x faster | 27% less | | **Mistral 7b** 1xT4 | [▶️ Start on Kaggle](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook) | 5x faster\* | 62% less | | **DPO - Zephyr** | [▶️ Start on Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) | 1.9x faster | 19% less | - This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates. - This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr. - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## 🦥 Unsloth.ai News - 📣 [Gemma 7b](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing) on 6T tokens now works. And [Gemma 2b notebook](https://colab.research.google.com/drive/15gGm7x_jTm017_Ic8e317tdIpDG53Mtu?usp=sharing) - 📣 Added [conversational notebooks](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) and [raw text notebooks](https://colab.research.google.com/drive/1bMOKOBzxQWUIGZBs_B0zm8pimuEnZdfM?usp=sharing) - 📣 [2x faster inference](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) added for all our models - 📣 [DPO support](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) is now included. [More info](#DPO) on DPO - 📣 We did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗Hugging Face and are in their official docs! Check out the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth) - 📣 [Download models 4x faster](https://huggingface.co/collections/unsloth/) from 🤗Hugging Face. Eg: `unsloth/mistral-7b-bnb-4bit` ## 🔗 Links and Resources | Type | Links | | ------------------------------- | --------------------------------------- | | 📚 **Wiki & FAQ** | [Read Our Wiki](https://github.com/unslothai/unsloth/wiki) | | 📜 **Documentation** | [Read The Doc](https://github.com/unslothai/unsloth/tree/main#-documentation) | | 💾 **Installation** | [unsloth/README.md](https://github.com/unslothai/unsloth/tree/main#installation-instructions)| |

**Twitter (aka X)** | [Follow us on X](https://twitter.com/unslothai)| | 🥇 **Benchmarking** | [Performance Tables](https://github.com/unslothai/unsloth/tree/main#-performance-benchmarking) | 🌐 **Released Models** | [Unsloth Releases](https://huggingface.co/unsloth)| | ✍️ **Blog** | [Read our Blogs](https://unsloth.ai/blog)| ## ⭐ Key Features - All kernels written in [OpenAI's Triton](https://openai.com/research/triton) language. **Manual backprop engine**. - **0% loss in accuracy** - no approximation methods - all exact. - No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow. - Works on **Linux** and **Windows** via WSL. - Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes). - Open source trains 5x faster - see [Unsloth Pro](https://unsloth.ai/) for **30x faster training**! - If you trained a model with 🦥Unsloth, you can use this cool sticker!

## 🥇 Performance Benchmarking - For the full list of **reproducable** benchmarking tables, [go to our website](https://unsloth.ai/blog/mistral-benchmark#Benchmark%20tables) | 1 A100 40GB | 🤗Hugging Face | Flash Attention | 🦥Unsloth Open Source | 🦥[Unsloth Pro](https://unsloth.ai/pricing) | |--------------|--------------|-----------------|---------------------|-----------------| | Alpaca | 1x | 1.04x | 1.98x | **15.64x** | | LAION Chip2 | 1x | 0.92x | 1.61x | **20.73x** | | OASST | 1x | 1.19x | 2.17x | **14.83x** | | Slim Orca | 1x | 1.18x | 2.22x | **14.82x** | - Benchmarking table below was conducted by [🤗Hugging Face](https://huggingface.co/blog/unsloth-trl). | Free Colab T4 | Dataset | 🤗Hugging Face | Pytorch 2.1.1 | 🦥Unsloth | 🦥 VRAM reduction | | --- | --- | --- | --- | --- | --- | | Llama-2 7b | OASST | 1x | 1.19x | 1.95x | -43.3% | | Mistral 7b | Alpaca | 1x | 1.07x | 1.56x | -13.7% | | Tiny Llama 1.1b | Alpaca | 1x | 2.06x | 3.87x | -73.8% | | DPO with Zephyr | Ultra Chat | 1x | 1.09x | 1.55x | -18.6% | ![](https://i.ibb.co/sJ7RhGG/image-41.png) [View on GitHub](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1) #### Suggested labels ####

386: SciPhi/AgentSearch-V1 · Datasets at Hugging Face

### Details

Similarity score: 0.85 - [ ] [SciPhi/AgentSearch-V1 · Datasets at Hugging Face](https://huggingface.co/datasets/SciPhi/AgentSearch-V1) #### Getting Started The AgentSearch-V1 dataset is a comprehensive collection of over one billion embeddings, produced using jina-v2-base. It includes more than 50 million high-quality documents and over 1 billion passages, covering a vast range of content from sources such as Arxiv, Wikipedia, Project Gutenberg, and includes carefully filtered Creative Commons (CC) data. Our team is dedicated to continuously expanding and enhancing this corpus to improve the search experience. We welcome your thoughts and suggestions – please feel free to reach out with your ideas! To access and utilize the AgentSearch-V1 dataset, you can stream it via HuggingFace with the following Python code: ```python from datasets import load_dataset import json import numpy as np # To stream the entire dataset: ds = load_dataset("SciPhi/AgentSearch-V1", data_files="**/*", split="train", streaming=True) # Optional, stream just the "arxiv" dataset # ds = load_dataset("SciPhi/AgentSearch-V1", data_files="**/*", split="train", data_files="arxiv/*", streaming=True) # To process the entries: for entry in ds: embeddings = np.frombuffer( entry['embeddings'], dtype=np.float32 ).reshape(-1, 768) text_chunks = json.loads(entry['text_chunks']) metadata = json.loads(entry['metadata']) print(f'Embeddings:\n{embeddings}\n\nChunks:\n{text_chunks}\n\nMetadata:\n{metadata}') break ``` A full set of scripts to recreate the dataset from scratch can be found [here](https://github.com/SciPhi-AI/agent-search/tree/main/scripts). Further, you may check the docs for details on how to perform RAG over AgentSearch. #### Languages English. #### Dataset Structure The raw dataset structure is as follows: ```json { "url": ..., "title": ..., "metadata": {"url": "...", "timestamp": "...", "source": "...", "language": "..."}, "text_chunks": ..., "embeddings": ..., "dataset": "book" | "arxiv" | "wikipedia" | "stack-exchange" | "open-math" | "RedPajama-Data-V2" } ``` #### Dataset Creation This dataset was created as a step towards making humanities most important knowledge openly searchable and LLM optimal. It was created by filtering, cleaning, and augmenting locally publicly available datasets. To cite our work, please use the following: @software{SciPhi2023AgentSearch, author = {SciPhi}, title = {AgentSearch [ΨΦ]: A Comprehensive Agent-First Framework and Dataset for Webscale Search}, year = {2023}, url = {https://github.com/SciPhi-AI/agent-search} } #### Source Data @ONLINE{wikidump, author = "Wikimedia Foundation", title = "Wikimedia Downloads", url = "https://dumps.wikimedia.org" } @misc{paster2023openwebmath, title={OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text}, author={Keiran Paster and Marco Dos Santos and Zhangir Azerbayev and Jimmy Ba}, year={2023}, eprint={2310.06786}, archivePrefix={arXiv}, primaryClass={cs.AI} } @software{together2023redpajama, author = {Together Computer}, title = {RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset}, month = April, year = 2023, url = {https://github.com/togethercomputer/RedPajama-Data} } #### License Please refer to the licenses of the data subsets you use. - Open-Web ([Common Crawl Foundation Terms of Use](https://commoncrawl.org/terms-of-use/)) - Books: the\_pile\_books3 license and pg19 license - ArXiv Terms of Use - Wikipedia License - StackExchange license on the Internet Archive #### Suggested labels #### { "key": "knowledge-dataset", "value": "A dataset with one billion embeddings from various sources, such as Arxiv, Wikipedia, Project Gutenberg, and carefully filtered Creative Commons data" }

626: classifiers/README.md at main · blockentropy/classifiers

### Details

Similarity score: 0.85 - [ ] [classifiers/README.md at main · blockentropy/classifiers](https://github.com/blockentropy/classifiers/blob/main/README.md?plain=1) # classifiers/README.md ## Fast Classifiers for Prompt Routing Routing and controlling the information flow is a core component in optimizing machine learning tasks. While some architectures focus on internal routing of data within a model, we focus on the external routing of data between models. This enables the combination of open source, proprietary, API based, and software based approaches to work together behind a smart router. We investigate three different ways of externally routing the prompt - cosine similarity via embeddings, zero-shot classification, and small classifiers. ## Implementation of Fast Classifiers The `code-class.ipynb` Jupyter notebook walks through the process of creating a fast prompt classifier for smart routing. For the fast classifiers, we utilize the model [DistilBERT](https://huggingface.co/docs/transformers/en/model_doc/distilbert), a smaller language representation model designed for efficient on-the-edge operation and training under computational constraints. DistilBERT is not only less costly to pre-train but also well-suited for on-device computations, as demonstrated through experiments and comparative studies. We quantize the model using [Optimum](https://huggingface.co/docs/optimum/index), enabling the model to run extremely fast on a CPU router. Each classifier takes 5-8ms to run. An ensemble of 8 prompt classifiers takes about 50ms in total. Thus, each endpoint can route about 20 requests per second. In the example `code-class`, we are deciding between prompts of code and not code prompts. The two datasets used are the 52K [instruction-following data](https://arxiv.org/abs/2304.03277) generated by GPT-4 with prompts in Alpaca. And the 20K instruction-following data used for fine-tuning the [Code Alpaca](https://github.com/sahil280114/codealpaca) model. Train test split of 80/20 yields an accuracy of 95.49% and f1 score of 0.9227. ![Train Test](./traintest.png) ## Comparison vs other Routing methods The most popular alternative to routing is via embedding similarity. For example, if one were to try to route a programming question, one might set up the set of target classes as ["coding", "not coding"]. Each one of these strings is then transformed into an embedding and compared against a prompt query like, "write a bubble sort in python". Given the computed pair-wise cosine similarity between the query and class, we can then label the prompt as a coding question and route the prompt to a coding-specific model. These do not scale well with larger numbers of embeddings. Nor are they able to capture non-semantic type classes (like is the response likely to be more or less than 200 tokens). However, they are adaptable and comparably fast and thus provide a good alternative to the trained fast classifiers. ![Train Test](./graphs.png) Quantifying different methods of routing in terms of execution time. As the prompt size increases, the query time also increases as shown in (a). There is also a close to linear increase in the time as the number of classes increase as shown in (b). However, the small classifiers do not increase in time as the class examples increase in the number of tokens (c). This is due to the upfront cost of training the binary classifier, reducing cost at inference. ## Reproducibility The `timing_tests.js` and `complexity.js` files can be used for reproducibility. Note that only the code classifier is currently available in this repo. One will need to install the appropriate models from the [Transformers.js](https://huggingface.co/docs/transformers.js/en/index) repo. [View on GitHub](https://github.com/blockentropy/classifiers/blob/main/README.md?plain=1) #### Suggested labels #### {'label-name': 'Prompt-Routing', 'label-description': 'Focuses on external routing of data between models to optimize machine learning tasks.', 'confidence': 50.24}

494: Awesome-Efficient-LLM: A curated list for Efficient Large Language Models

### Details

Similarity score: 0.84 - [ ] [horseee/Awesome-Efficient-LLM: A curated list for Efficient Large Language Models](https://github.com/horseee/Awesome-Efficient-LLM#inference-acceleration) # Awesome-Efficient-LLM A curated list for [Efficient Large Language Models](https://github.com/horseee/Awesome-Efficient-LLM): - [Knowledge Distillation](#knowledge-distillation) - [Network Pruning](#network-pruning) - [Quantization](#quantization) - [Inference Acceleration](#inference-acceleration) - [Efficient MOE](#efficient-moe) - [Text Compression](#text-compression) - [Low-Rank Decomposition](#low-rank-decomposition) - [Hardware/System Tuning](#hardwareSystem-tuning) - [Survey](#survey) - [Leaderboard](#leaderboard) - [🚀 Updates](#updates) - [Contributing](#contributing) --- ## Inference Acceleration - … - [Add your paper here](https://github.com/horseee/Awesome-Efficient-LLM/blob/main/generate_item.py), [generate the required format](https://github.com/horseee/Awesome-Efficient-LLM#decontributing), and submit a pull request. --- ## Updates - **Sep 27, 2023:** Add tag for papers accepted at NeurIPS'23. - **Sep 6, 2023:** Add a new subdirectory `project/` to organize those projects designed for developing a lightweight LLM. - **July 11, 2023:** Create a new subdirectory `efficient_plm/` for papers applicable to PLMs (such as BERT, BART) but have yet to be verified for their effectiveness on LLMs. --- ## Contributing If you'd like to include your paper or need to update any details, please feel free to submit a pull request. You can generate the required markdown format for each paper by filling in the information in `generate_item.py` and execute `python generate_item.py`. We warmly appreciate your contributions to this list. Alternatively, you can email me with the links to your paper and code, and I would add your paper to the list at my earliest convenience. - URL: [https://github.com/horseee/Awesome-Efficient-LLM#inference-acceleration](https://github.com/horseee/Awesome-Efficient-LLM#inference-acceleration) #### Suggested labels #### { "label-name": "efficient-llm-acceleration", "description": "Inference acceleration techniques for efficient large language models.", "repo": "horseee/Awesome-Efficient-LLM", "confidence": 70.8 }

irthomasthomas / undecidability