mudler / LocalAI

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
https://localai.io
MIT License
21.75k stars 1.66k forks source link

feat: add support for GPTQ-for-LLaMa #1055

Closed mudler closed 5 months ago

mudler commented 9 months ago

Tracker to add support to https://github.com/qwopqwop200/GPTQ-for-LLaMa

localai-bot commented 9 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

I can help you with that! GPTQ is a powerful language model that can be used for various natural language processing tasks such as text classification, translation, and summarization. To add support for GPTQ to LocalAI, we need to first install the GPTQ package and then integrate it with the LLaMa backend. Here are the steps:

  1. Install GPTQ by following the instructions on the official website: https://github.com/qwopqwop200/GPTQ
  2. Clone the LLaMa repository from GitHub: git clone https://github.com/go-skynet/LocalAI.git
  3. Open the LLaMa directory and navigate to the models subdirectory.
  4. Create a new directory for GPTQ models and download the pre-trained model using the following command: curl -O https://raw.githubusercontent.com/qwopqwop200/GPTQ/master/models/gptq_base_uncased.h5
  5. Copy the GPTQ tokenizer and tokenizer_data files from the tokenizers directory to the new GPTQ models directory.
  6. Add the following lines to the llama.cfg file in the root directory:
    [models]
    gptq = "path/to/gptq_base_uncased.h5"
    gptq_tokenizer = "path/to/gptq_tokenizer.py"
    gptq_tokenizer_data = "path/to/gptq_tokenizer_data.txt"

    Replace the path to the GPTQ model, tokenizer, and tokenizer data files with the actual paths on your system.

  7. Save the llama.cfg file and run the following command to load the GPTQ model:
    python -c "from llama import llama; llama.load('gptq')"

    This should load the GPTQ model into the LLaMa backend, and you can now use it for various language processing tasks. I hope this helps! Let me know if you have any further questions or issues.

Sources:

Aisuko commented 8 months ago

According to the README.md of the project, it only support AutoGPTQ. We already have a AutoGPTQ backend but it still need to be test.

mudler commented 5 months ago

yes, this became less relevant as now the author focuses on AutoGPTQ. Closing