migtissera/Synthia-70B-v1.2b · Hugging Face

Task	Metric	Value
arc_challenge	acc_norm	68.77
hellaswag	acc_norm	87.57
mmlu	acc_norm	68.81
truthfulqa_mc	mc2	57.69
Total Average	-	70.71

Related issues

459: llama2

### Details

Similarity score: 0.88 - [ ] [llama2](https://ollama.ai/library/llama2) Llama 2 ======== The most popular model for general use. *265.8K Pulls* *Updated 4 weeks ago* Overview -------- Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat. CLI --- Open the terminal and run ```bash ollama run llama2 ``` API --- Example using curl: ```bash curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt":"Why is the sky blue?" }' ``` API documentation ----------------- Memory requirements ------------------- - 7b models generally require at least 8GB of RAM - 13b models generally require at least 16GB of RAM - 70b models generally require at least 64GB of RAM If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory. Model variants -------------- - **Chat**: fine-tuned for chat/dialogue use cases. These are the default in Ollama, and for models tagged with `-chat` in the tags tab. Example: `ollama run llama2` - **Pre-trained**: without the chat fine-tuning. This is tagged as `-text` in the tags tab. Example: `ollama run llama2:text` By default, Ollama uses 4-bit quantization. To try other quantization levels, please use the other tags. The number after the `q` represents the number of bits used for quantization (i.e. `q4` means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires. References ---------- - [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://metastring.com/llama2) - [Meta’s Hugging Face repo](https://huggingface.co/Meta) #### Suggested labels #### { "label-name": "llama2-model", "description": "A powerful text model for chat, dialogue, and general use.", "repo": "ollama.ai/library/llama2", "confidence": 91.74 }

552: LargeWorldModel/LWM-Text-Chat-1M · Hugging Face

### Details

Similarity score: 0.87 - [ ] [LargeWorldModel/LWM-Text-Chat-1M · Hugging Face](https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M) # LargeWorldModel/LWM-Text-Chat-1M · Hugging Face **DESCRIPTION:** LWM-Text-1M-Chat Model Card **Model details** Model type: LWM-Text-1M-Chat is an open-source model trained from LLaMA-2 on a subset of Books3 filtered data. It is an auto-regressive language model, based on the transformer architecture. Model date: LWM-Text-1M-Chat was trained in December 2023. Paper or resources for more information: [https://largeworldmodel.github.io/](https://largeworldmodel.github.io/) **URL:** [https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M](https://huggingface.co/LargeWorldModel/LWM-Text-Chat-1M) #### Suggested labels #### {'label-name': 'Open-source Models', 'label-description': 'Models that are publicly available and open-source for usage and exploration.', 'gh-repo': 'huggingfaceco/LargeWorldModel/LWM-Text-Chat-1M', 'confidence': 56.11}

489: Hallucinations Leaderboard - a Hugging Face Space by hallucinations-leaderboard

### Details

Similarity score: 0.86 - [ ] [Hallucinations Leaderboard - a Hugging Face Space by hallucinations-leaderboard](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard) # Hallucinations Leaderboard - a Hugging Face Space by hallucinations-leaderboard ## Description [Hugging Face Space](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard) Explore the Hallucinations Leaderboard, a space dedicated to tracking and benchmarking the performance of language models in generating accurate and coherent responses. This leaderboard focuses on the potential issue of hallucinations, where models generate plausible but factually incorrect information. By highlighting the best models and fostering competition, we aim to encourage developers to create more reliable and truthful language generation systems. This space is a continuously updated resource for researchers, developers, and enthusiasts interested in the evaluation and improvement of language model performance. ![Hallucinations Leaderboard Screenshot](https://i.imgur.com/wgD8NrN.png) ## Key Features - **Regular Updates:** The leaderboard is frequently updated with the latest model performances, ensuring that you always have access to the most recent data. - **Easy Comparison:** Compare the performance of various models side-by-side to identify trends and determine which models perform best in specific tasks or scenarios. - **Benchmarking Tools:** Utilize built-in benchmarking tools to test your own models and see how they stack up against the competition. - **Community Engagement:** Join a community of developers, researchers, and enthusiasts dedicated to improving language model performance and reducing hallucinations. ## Explore the Hallucinations Leaderboard Visit the [Hallucinations Leaderboard Hugging Face Space](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard) to start exploring and stay up-to-date with the latest advancements in language model performance and hallucination reduction. **Note:** This leaderboard focuses on English language models. If you would like to contribute data for other languages, please contact the maintainers. #### Suggested labels #### { "label-name": "hugging-face-space", "description": "A dedicated space for sharing models, datasets, and other resources within the Hugging Face community.", "repo": "hallucinations-leaderboard", "confidence": 91.35 }

166: TinyLlama-1.1B-Chat-v0.6-GGUF · Hugging Face

### Details

Similarity score: 0.86 - [ ] [afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF · Hugging Face](https://huggingface.co/afrideva/TinyLlama-1.1B-Chat-v0.6-GGUF) This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-955k-2T. We follow HF's Zephyr's training recipe. The model was " initially fine-tuned on a variant of the UltraChat dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. We then further aligned the model with 🤗 TRL's DPOTrainer on the openbmb/UltraFeedback dataset, which contain 64k prompts and model completions that are ranked by GPT-4." How to use You will need the transformers>=4.34 Do check the TinyLlama github page for more information.

158: TinyLlama-1.1B-intermediate-step-1195k-token-2.5T-exl2 · Hugging Face

### Details

Similarity score: 0.86 - [ ] [zakoman/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T-exl2 · Hugging Face](https://huggingface.co/zakoman/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T-exl2) 8bpw exllamav2 quantisation of TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T Calibration dataset used is vicgalle/alpaca-gpt4 Original model card TinyLlama-1.1B https://github.com/jzhang38/TinyLlama The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

499: marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.

### Details

Similarity score: 0.86 - [ ] [marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.](https://github.com/marella/ctransformers?tab=readme-ov-file#gptq) # CTransformers [![PyPI version](https://badge.fury.io/py/ctransformers.svg)](https://badge.fury.io/py/ctransformers) [![Documentation](https://readthedocs.org/images/button/readthedocs-ci.svg)](https://ctransformers.readthedocs.io/) [![Build and Test](https://github.com/ marella / ctransformers / actions / workflows / build.yml / badge.svg)](https://github.com/marella/ctransformers/actions/workflows/build.yml) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) Python bindings for the Transformer models implemented in C/C++ using GGML library. Also see [ChatDocs](https://github.com/marella/chatdocs) ## Supported Models | Model | Model Type | CUDA | Metal | | ------ | --------- | :--: | :--: | | GPT-2 | gpt2 | | | | GPT-J, GPT4All-J | gptj | | | | GPT-NeoX, StableLM | gpt_neox | | | | Falcon | falcon | ✅ | | | LLaMA, LLaMA 2 | llamai | ✅ | ✅ | | MPT | mpt | ✅ | | | StarCoder, StarChat | gpt_bigcode | ✅ | | | Dolly V2 | dolly-v2 | | | | Replit | replit | | | ## Installation To install via `pip`, simply run: ``` pip install ctransformers ``` ## Usage It provides a unified interface for all models: ```python from ctransformers import AutoModelForCausalLM llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2") print(llm("AI is going to")) ``` Run in Google Colab To stream the output: ```python for text in llm("AI is going to", stream=True): print(text, end="", flush=True) ``` You can load models from Hugging Face Hub directly: ```python llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml") ``` If a model repo has multiple model files (`.bin` or `.gguf` files), specify a model file using: ```python llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", model_file="ggml-model.bin") ``` ### 🤗 Transformers Note: This is an experimental feature and may change in the future. To use with 🤗 Transformers, create the model and tokenizer using: ```python from ctransformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True) tokenizer = AutoTokenizer.from_pretrained(model) ``` Run in Google Colab You can use 🤗 Transformers text generation pipeline: ```python from transformers import pipeline pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) print(pipe("AI is going to", max_new_tokens=256)) ``` You can use 🤗 Transformers generation parameters: ```python pipe("AI is going to", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1) ``` You can use 🤗 Transformers tokenizers: ```python from ctransformers import AutoModelForCausalLM from transformers import AutoTokenizer model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True) # Load model from GGML model repo. tokenizer = AutoTokenizer.from_pretrained("gpt2") # Load tokenizer from original model repo. ``` ### LangChain It is integrated into LangChain. See LangChain [docs](https://github.com/LangChainAI/langchain#using-ctransformers-backed-models). ### GPU To run some of the model layers on GPU, set the `gpu_layers` parameter: ```python llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50) ``` Run in Google Colab #### CUDA Install CUDA libraries using: ```bash pip install ctransformers[cuda] ``` #### ROCm To enable ROCm support, install the `ctransformers` package using: ```bash CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers ``` #### Metal To enable Metal support, install the `ctransformers` package using: ```bash CT_METAL=1 pip install ctransformers --no-binary ctransformers ``` ### GPTQ Note: This is an experimental feature and only LLaMA models are supported using [ExLlama](https ://github.com/TheLastBen/exllama). Install additional dependencies using: ```bash pip install ctransformers[gptq] ``` Load a GPTQ model using: ```python llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ") ``` Run in Google Colab If the model name or path doesn't contain the word `gptq`, specify `model_type="gptq"`. It can also be used with LangChain. Low-level APIs are not fully supported. ## Documentation Find the documentation on [Read the Docs](https://ctransformers.readthedocs.io/). #### Config | Parameter | Type | Description | Default | | --------- | ------ | ------------------------------------------------------------ | ------- | | `top_k` | `int` | The top-k value to use for sampling | `40` | | `top_p` | `float` | The top-p value to use for sampling | `0.95` | | `temperature` | `float` | The temperature to use for sampling | `0.8` | | `repetition_penalty` | `float` | The repetition penalty to use for sampling | `1.1` | | `last_n_tokens` | `int` | The number of last tokens to use for repetition penalty | `64` | | `seed` | `int` | The seed value to use for sampling tokens | `-1` | | `max_new_tokens` | `int` | The maximum number of new tokens to generate | `256` | | `stop` | `List` | A list of sequences to stop generation when encountered | `None` | | `stream` | `bool` | Whether to stream the generated text | `False` | | `reset` | `bool` | Whether to reset the model state before generating text | `True` | | `batch_size` | `int` | The batch size to use for evaluating tokens in a single prompt | `8` | | `threads` | `int` | The number of threads to use for evaluating tokens | `-1` | | `context_length` | `int` | The maximum context length to use | `-1` | | `gpu_layers` | `int` | The number of layers to run on GPU | `0` | Find the URL for the model card for GPTQ [here](https://github.com/marella/ctransformers?tab=readme-ov-file#gptq). --- Made with ❤️ by [marella](https://github.com/marella) #### Suggested labels #### null

irthomasthomas / undecidability