irthomasthomas / undecidability

2 stars 2 forks source link

TabbyML: Self-hosted AI coding assistant. #642

Open irthomasthomas opened 4 months ago

irthomasthomas commented 4 months ago

tabby/ at main · TabbyML/tabby

# 🐾 Tabby [![latest release](]( [![PRs Welcome](]( [![Docker pulls](]( [![codecov](](

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features:

Open in Playground


🔥 What's New

Archived * **10/15/2023** RAG-based code completion is enabled by detail in [v0.3.0](🎉! Check out the [blogpost]( explaining how Tabby utilizes repo-level context to get even smarter! * **11/27/2023** [v0.6.0]( released! * **11/09/2023** [v0.5.5]( released! With a redesign of UI + performance improvement. * **10/04/2023** Check out the [model directory]( for the latest models supported by Tabby. * **09/18/2023** Apple's M1/M2 Metal inference support has landed in [v0.1.1](! * **08/31/2023** Tabby's first stable release [v0.0.1]( 🥳. * **08/28/2023** Experimental support for the [CodeLlama 7B]( * **08/24/2023** Tabby is now on [JetBrains Marketplace](!

👋 Getting Started

You can find our documentation here.

Run Tabby in 1 Minute

The easiest way to start a Tabby server is by using the following Docker command:

```bash docker run -it \ --gpus all -p 8080:8080 -v \$HOME/.tabby:/data \ tabbyml/tabby \ serve --model TabbyML/StarCoder-1B --device cuda ``` For additional options (e.g inference type, parallelism), please refer to the documentation page.

🤝 Contributing

Full guide at;

Get the Code

```bash git clone --recurse-submodules cd tabby ```

If you have already cloned the repository, you could run the `git submodule update --recursive --init` command to fetch all submodules.


  1. Set up the Rust environment by following this tutorial.

  2. Install the required dependencies: ```bash

    For MacOS

    brew install protobuf

For Ubuntu / Debian

apt-get install protobuf-compiler libopenblas-dev ```

  1. Now, you can build Tabby by running the command `cargo build`.

Start Hacking!

... and don't forget to submit a Pull Request

🌍 Community

🌟 Star History

Star History Chart

URL: tabby/

Suggested labels

irthomasthomas commented 4 months ago

Related issues

625: unsloth/ at main · unslothai/unsloth

DetailsSimilarity score: 0.87 - [ ] [unsloth/ at main · unslothai/unsloth]( # unsloth/ at main · unslothai/unsloth
unsloth logo ### Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory! ![](
## ✨ Finetune for Free All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face. | Unsloth supports | Free Notebooks | Performance | Memory use | |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------| | **Gemma 7b** | [▶️ Start on Colab]( | 2.4x faster | 58% less | | **Mistral 7b** | [▶️ Start on Colab]( | 2.2x faster | 62% less | | **Llama-2 7b** | [▶️ Start on Colab]( | 2.2x faster | 43% less | | **TinyLlama** | [▶️ Start on Colab]( | 3.9x faster | 74% less | | **CodeLlama 34b** A100 | [▶️ Start on Colab]( | 1.9x faster | 27% less | | **Mistral 7b** 1xT4 | [▶️ Start on Kaggle]( | 5x faster\* | 62% less | | **DPO - Zephyr** | [▶️ Start on Colab]( | 1.9x faster | 19% less | - This [conversational notebook]( is useful for ShareGPT ChatML / Vicuna templates. - This [text completion notebook]( is for raw text. This [DPO notebook]( replicates Zephyr. - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster. ## 🦥 News - 📣 [Gemma 7b]( on 6T tokens now works. And [Gemma 2b notebook]( - 📣 Added [conversational notebooks]( and [raw text notebooks]( - 📣 [2x faster inference]( added for all our models - 📣 [DPO support]( is now included. [More info](#DPO) on DPO - 📣 We did a [blog]( with 🤗Hugging Face and are in their official docs! Check out the [SFT docs]( and [DPO docs]( - 📣 [Download models 4x faster]( from 🤗Hugging Face. Eg: `unsloth/mistral-7b-bnb-4bit` ## 🔗 Links and Resources | Type | Links | | ------------------------------- | --------------------------------------- | | 📚 **Wiki & FAQ** | [Read Our Wiki]( | | 📜 **Documentation** | [Read The Doc]( | | 💾 **Installation** | [unsloth/](| |   **Twitter (aka X)** | [Follow us on X](| | 🥇 **Benchmarking** | [Performance Tables]( | 🌐 **Released Models** | [Unsloth Releases](| | ✍️ **Blog** | [Read our Blogs](| ## ⭐ Key Features - All kernels written in [OpenAI's Triton]( language. **Manual backprop engine**. - **0% loss in accuracy** - no approximation methods - all exact. - No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!]( GTX 1070, 1080 works, but is slow. - Works on **Linux** and **Windows** via WSL. - Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes]( - Open source trains 5x faster - see [Unsloth Pro]( for **30x faster training**! - If you trained a model with 🦥Unsloth, you can use this cool sticker!   ## 🥇 Performance Benchmarking - For the full list of **reproducable** benchmarking tables, [go to our website]( | 1 A100 40GB | 🤗Hugging Face | Flash Attention | 🦥Unsloth Open Source | 🦥[Unsloth Pro]( | |--------------|--------------|-----------------|---------------------|-----------------| | Alpaca | 1x | 1.04x | 1.98x | **15.64x** | | LAION Chip2 | 1x | 0.92x | 1.61x | **20.73x** | | OASST | 1x | 1.19x | 2.17x | **14.83x** | | Slim Orca | 1x | 1.18x | 2.22x | **14.82x** | - Benchmarking table below was conducted by [🤗Hugging Face]( | Free Colab T4 | Dataset | 🤗Hugging Face | Pytorch 2.1.1 | 🦥Unsloth | 🦥 VRAM reduction | | --- | --- | --- | --- | --- | --- | | Llama-2 7b | OASST | 1x | 1.19x | 1.95x | -43.3% | | Mistral 7b | Alpaca | 1x | 1.07x | 1.56x | -13.7% | | Tiny Llama 1.1b | Alpaca | 1x | 2.06x | 3.87x | -73.8% | | DPO with Zephyr | Ultra Chat | 1x | 1.09x | 1.55x | -18.6% | ![]( [View on GitHub]( #### Suggested labels ####

640: · defog/sqlcoder-7b-2 at main

DetailsSimilarity score: 0.85 - [ ] [ · defog/sqlcoder-7b-2 at main]( # · defog/sqlcoder-7b-2 at main **DESCRIPTION:** ```yaml license: cc-by-sa-4.0 library_name: transformers pipeline_tag: text-generation ``` ## Update notice The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins. If you downloaded the model before that, please redownload the weights for best performance. ## Model Card for SQLCoder-7B-2 A capable large language model for natural language to SQL generation. ![image/png]( ### Model Details #### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** [Defog, Inc]( - **Model type:** [Text to SQL] - **License:** [CC-by-SA-4.0] - **Finetuned from model:** [CodeLlama-7B] #### Model Sources [optional] - [**HuggingFace:**]( - [**GitHub:**]( - [**Demo:**]( ## Uses This model is intended to be used by non-technical users to understand data inside their SQL databases. It is meant as an analytics tool, and not as a database admin tool. This model has not been trained to reject malicious requests from users with write access to databases, and should only be used by users with read-only access. ## How to Get Started with the Model Use the code [here]( to get started with the model. ## Prompt Please use the following prompt for optimal results. Please remember to use `do_sample=False` and `num_beams=4` for optimal results. ``` ### Task Generate a SQL query to answer [QUESTION]{user_question}[/QUESTION] ### Database Schema The query will run on a database with the following schema: {table_metadata_string_DDL_statements} ### Answer Given the database schema, here is the SQL query that [QUESTION]{user_question}[/QUESTION] [SQL] ``` ## Evaluation This model was evaluated on [SQL-Eval](, a PostgreSQL based evaluation framework developed by Defog for testing and alignment of model capabilities. You can read more about the methodology behind SQLEval [here]( ### Results We classified each generated question into one of 6 categories. The table displays the percentage of questions answered correctly by each model, broken down by category. | | date | group_by | order_by | ratio | join | where | | -------------- | ---- | -------- | -------- | ----- | ---- | ----- | | sqlcoder-70b | 96 | 91.4 | 97.1 | 85.7 | 97.1 | 91.4 | | sqlcoder-7b-2 | 96 | 91.4 | 94.3 | 91.4 | 94.3 | 77.1 | | sqlcoder-34b | 80 | 94.3 | 85.7 | 77.1 | 85.7 | 80 | | gpt-4 | 72 | 94.3 | 97.1 | 80 | 91.4 | 80 | | gpt-4-turbo | 76 | 91.4 | 91.4 | 62.8 | 88.6 | 77.1 | | natural-sql-7b | 56 | 88.6 | 85.7 | 60 | 88.6 | 80 | | sqlcoder-7b | 64 | 82.9 | 74.3 | 54.3 | 74.3 | 74.3 | | gpt-3.5 | 72 | 77.1 | 82.8 | 34.3 | 65.7 | 71.4 | | claude-2 | 52 | 71.4 | 74.3 | 57.1 | 65.7 | 62.9 | ## Model Card Contact Contact us on X at [@defogdata](, or on email at []( **URL:** []( #### Suggested labels ####

498: CodeGPTPlus/deepseek-coder-1.3b-typescript · Hugging Face

### DetailsSimilarity score: 0.85 - [ ] [CodeGPTPlus/deepseek-coder-1.3b-typescript · Hugging Face]( # CodeGPTPlus/deepseek-coder-1.3b-typescript This is a fine-tuned model by the CodeGPT team, specifically crafted for generating expert code in TypeScript. It is fine-tuned from `deepseek-ai/deepseek-coder-1.3b-base` with a dataset of 0.5B tokens, making it an excellent choice for precise and efficient TypeScript code generation. The model uses a 16K window size and an additional fill-in-the-middle task for project-level code completion. ## How to Use This model is for completion purposes only. Here are some examples of how to use the model: ### Running the model on a GPU ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True).cuda() input_text = """<|fim begin|>function quickSort(arr: number[]): number[] { if (arr.length <= 1) { return arr; } const pivot = arr[0]; const left = []; const right = []; <|fim hole|> return [...quickSort(left), pivot, ...quickSort(right)]; }<|fim end|>""" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Running with Ollama - Model: []( - Command: `ollama run codegpt/deepseek-coder-1.3b-typescript` ### Running with Ollama and CodeGPT Autocomplete in VSCode - Documentation: [\_autocompletion]( - Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector. ### Fill In the Middle (FIM) ```python <|fim begin|>function quickSort(arr: number[]): number[] { if (arr.length <= 1) { return arr; } const pivot = arr[0]; const left = []; const right = []; <|fim hole|> return [...quickSort(left), pivot, ...quickSort(right)]; }<|fim end|> ``` ### Training Procedure The model was trained using the following hyperparameters: - learning\_rate: 2e-05 - train\_batch\_size: 20 - eval\_batch\_size: 20 - seed: 42 - gradient\_accumulation\_steps: 2 - total\_train\_batch\_size: 40 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 - lr\_scheduler\_type: cosine - lr\_scheduler\_warmup\_steps: 261 - num\_epochs: 1 For more information, visit the [model page]( #### Suggested labels #### { "label-name": "TypeScript-Code-Generation", "description": "Model for generating TypeScript code", "repo": "CodeGPTPlus/deepseek-coder-1.3b-typescript", "confidence": 70.59 }

309: openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"

### DetailsSimilarity score: 0.85 - [ ] [openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"]( HumanEval: Hand-Written Evaluation Set This is an evaluation harness for the HumanEval problem solving dataset described in the paper "Evaluating Large Language Models Trained on Code". Installation Make sure to use python 3.7 or later: $ conda create -n codex python=3.7 $ conda activate codex Check out and install this repository: $ git clone $ pip install -e human-eval Usage This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in for more information and instructions. After following the above instructions to enable execution, generate samples and save them in the following JSON Lines (jsonl) format, where each sample is formatted into a single line like so: {"task_id": "Corresponding HumanEval task ID", "completion": "Completion only without the prompt"} We provide example_problem.jsonl and example_solutions.jsonl under data to illustrate the format and help with debugging. Here is nearly functional example code (you just have to provide generate_one_completion to make it work) that saves generated completions to samples.jsonl. from import write_jsonl, read_problems problems = read_problems() num_samples_per_task = 200 samples = [ dict(task_id=task_id, completion=generate_one_completion(problems[task_id]["prompt"])) for task_id in problems for _ in range(num_samples_per_task) ] write_jsonl("samples.jsonl", samples) To evaluate the samples, run $ evaluate_functional_correctness samples.jsonl Reading samples... 32800it [00:01, 23787.50it/s] Running test suites... 100%|...| 32800/32800 [16:11<00:00, 33.76it/s] Writing results to samples.jsonl_results.jsonl... 100%|...| 32800/32800 [00:00<00:00, 42876.84it/s] {'pass@1': ..., 'pass@10': ..., 'pass@100': ...} This script provides more fine-grained information in a new file ending in _results.jsonl. Each row now contains whether the completion passed along with the execution result which is one of "passed", "timed out", or "failed". As a quick sanity-check, the example samples should yield 0.5 pass@1. $ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl Reading samples... 6it [00:00, 3397.11it/s] Running example suites... 100%|...| 6/6 [00:03<00:00, 1.96it/s] Writing results to data/example_samples.jsonl_results.jsonl... 100%|...| 6/6 [00:00<00:00, 6148.50it/s] {'pass@1': 0.4999999999999999} Because there is no unbiased way of estimating pass@k when there are fewer samples than k, the script does not evaluate pass@k for these cases. To evaluate with other k values, pass --k=. For other options, see $ evaluate_functional_correctness --help However, we recommend that you use the default values for the rest. Known Issues While evaluation uses very little memory, you might see the following error message when the system is running out of RAM. Since this may cause some correct programs to fail, we recommend that you free some memory and try again. malloc: can't allocate region Citation Please cite using the following bibtex entry: @article{chen2021codex, title={Evaluating Large Language Models Trained on Code}, author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba}, year={2021}, eprint={2107.03374}, archivePrefix={arXiv}, primaryClass={cs.LG} } #### Suggested labels #### { "key": "llm-evaluation", "value": "Evaluating Large Language Models performance and behavior through human-written evaluation sets" }

628: LLaVA/ at main · haotian-liu/LLaVA

### DetailsSimilarity score: 0.85 - [ ] [LLaVA/ at main · haotian-liu/LLaVA]( # LLaVA/ at main · haotian-liu/LLaVA ## 🌋 LLaVA: Large Language and Vision Assistant *Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.* [📢 LLaVA-NeXT Blog]( [Project Page]( [Demo]( [Data]( [Model Zoo]( 🤝Community Contributions: [llama.cpp]( [Colab]( [🤗Space]( [Replicate]( [AutoGen]( [BakLLaVA]( **Improved Baselines with Visual Instruction Tuning** [Paper]( [HF](
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee **Visual Instruction Tuning** (NeurIPS 2023, Oral) [Paper]( [HF](
Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution) ## Release - [1/30] 🔥 LLaVA-NeXT (LLaVA-1.6) is out! With additional scaling to LLaVA-1.5, LLaVA-NeXT-34B outperforms Gemini Pro on some benchmarks. It can now process 4x more pixels and perform more tasks/applications than before. Check out the [blog post](, and explore the [demo](! Models are available in [Model Zoo]( Training/eval data and scripts coming soon. - [11/10] [LLaVA-Plus]( is released: Learning to Use Tools for Creating Multimodal Agents, with LLaVA-Plus (LLaVA that Plug and Learn to Use Skills). [Project Page]( [Demo]( [Code]( [Paper]( - [11/2] [LLaVA-Interactive]( is released: Experience the future of human-AI multimodal interaction with an all-in-one demo for Image Chat, Segmentation, Generation and Editing. [Project Page]( [Demo]( [Code]( [Paper]( - [10/26] 🔥 LLaVA-1.5 with LoRA achieves comparable performance as full-model finetuning, with a reduced GPU RAM requirement (ckpts) (script). We also provide a doc on how to finetune LLaVA-1.5 on your own dataset with LoRA. - [10/12] Check out the Korean LLaVA (Ko-LLaVA), created by ETRI, who has generously supported our research! [🤗 Demo]( - [10/5] 🔥 LLaVA-1.5 is out! Achieving SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods like Qwen-VL-Chat that use billion-scale data. Check out the technical report, and explore the demo! Models are available in Model Zoo. The training data and scripts of LLaVA-1.5 are released here, and evaluation scripts are released here. - [9/26] LLaVA is improved with reinforcement learning from human feedback (RLHF) to improve fact grounding and reduce hallucination. Check out the new SFT and RLHF checkpoints at project LLavA-RLHF. - [9/22] LLaVA is accepted by NeurIPS 2023 as oral presentation, and LLaVA-Med is accepted by NeurIPS 2023 Datasets and Benchmarks Track as spotlight presentation.
More - [11/6] Support Intel dGPU and CPU platforms. More details here. - [10/12] LLaVA is now supported in llama.cpp with 4-bit / 5-bit quantization support! - [10/11] The training data and scripts of LLaVA-1.5 are released here, and evaluation scripts are released here! - [10/10] Roboflow Deep Dive: First Impressions with LLaVA-1.5. - [9/20] We summarize our empirical study of training 33B and 65B LLaVA models in a note. Further, if you are interested in the comprehensive review, evolution and trend of multimodal foundation models, please check out our recent survey paper "Multimodal Foundation Models: From Specialists to General-Purpose Assistants".

- [7/19] We release a major upgrade, including support for LLaMA-2, LoRA training, 4-/8-bit inference, higher resolution (336x336), and a lot more. We release LLaVA Bench for benchmarking open-ended visual chat with results from Bard and Bing-Chat. We also support and verify training with RTX 3090 and RTX A6000. Check out LLaVA-from-LLaMA-2, and our model zoo! - [6/26] CVPR 2023 Tutorial on Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4! Please check out Slides Notes YouTube Bilibli. - [6/11] We released the preview for the most requested feature: DeepSpeed and LoRA support! Please see documentations here. - [6/1] We released LLaVA-Med: Large Language and Vision Assistant for Biomedicine, a step towards building biomedical domain large language and vision models with GPT-4 level capabilities. Checkout the paper and page. - [5/6] We are releasing LLaVA-Lighting-MPT-7B-preview, based on MPT-7B-Chat! See here for more details. - [5/2] We are releasing LLaVA-Lighting! Train a lite, multimodal GPT-4 with just $40 in 3 hours! See here for more details. - [4/27] Thanks to the community effort, LLaVA-13B with 4-bit quantization allows you to run on a GPU with as few as 12GB VRAM! Try it out here. - [4/17] We released LLaVA: Large Language and Vision Assistant. We propose visual instruction tuning, towards building large language and vision models with GPT-4 level capabilities. Checkout the paper and demo.
[Code License]( **Usage and License Notices**: This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses, including but not limited to the OpenAI Terms of Use for the dataset and the specific licenses for base language models for checkpoints trained using the dataset (e.g. Llama community license for LLaMA-2 and Vicuna-v1.5). This project does not impose any additional constraints beyond those stipulated in the original licenses. Furthermore, users are reminded to ensure that their use of the dataset and checkpoints is in compliance with all applicable laws and regulations. ## Contents - [Install](#install) - [LLaVA Weights](#llava-weights) - [Demo](#Demo) - [Model Zoo]( - [Dataset]( - [Train](#train) - [Evaluation](#evaluation) #### Suggested labels ####