mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.68k stars 1.81k forks source link

Mixtral support ? #1421

Closed muka closed 9 months ago

muka commented 9 months ago

I would like to ask (if not already planned) to support Mixtral from MistralAI

https://docs.mistral.ai/models/

Thank you

lunamidori5 commented 9 months ago

Already supported @muka https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF

I am wrong my bad

muka commented 9 months ago

Awesome! Will check it out, thanks

lunamidori5 commented 9 months ago

Guess who was wrong... This girl... Hi hello I was confused that was not the model I was thinking about. They seem to have a newer copy of LLama being used @muka / @mudler

jamiemoller commented 9 months ago

llamacpp pull req for reference https://github.com/ggerganov/llama.cpp/pull/4406

localai-bot commented 9 months ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

Hello! We are happy to help with your request. LocalAI is a community-driven project and we always appreciate feedback and feature requests. We have not yet had the opportunity to test Mixtral from MistralAI, but it looks like an interesting architecture that could be compatible with LocalAI. We will add it to our list of potential backends to support in the future. In the meantime, you can try to use Mixtral with LocalAI by creating a custom backend. This involves writing a small wrapper script that connects LocalAI with the Mixtral API. You can find more information on how to do this in our documentation: https://docs.localai.org/en/latest/custom-backends.html#creating-a-custom-backend. If you have any further questions or need help with this, feel free to ask!

Sources:

sweetcard commented 9 months ago

llamacpp pull req for reference ggerganov/llama.cpp#4406

The pr has been merged. We can update llama.cpp.

loversama commented 9 months ago

Could we also get params for: Keys.LLM.EXPERT_COUNT and Keys.LLM.EXPERT_USED_COUNT in the config when loading?

muka commented 9 months ago

this in our documentation: https://docs.localai.org/en/latest/custom-backends.html#creating-a-custom-backend. If you

Just to let you know this link points to an unreachable (?) domain .org, instead of .io

lunamidori5 commented 9 months ago

this in our documentation: https://docs.localai.org/en/latest/custom-backends.html#creating-a-custom-backend. If you

Just to let you know this link points to an unreachable (?) domain .org, instead of .io

Thats a AI, its wrong

mudler commented 9 months ago

this PR: https://github.com/mudler/LocalAI/pull/1440 actually updated llama.cpp to point to latest, so master now should work with mixtral

Taronyuu commented 9 months ago

Given that this is the only Mixtral related issue in this repository I will share my issues that I ran into here. All thanks to @lunamidori5 who helped me on Discord.

I tried to setup TheBloke/Mixtral-8x7B-v0.1-GGUF within LocalAI and went into a complete rabbit hole to setup my chat template which didn't want to work. An hour wasted until Luna recomended to use TheBloke/dolphin-2.5-mixtral-8x7b-GGUF instead. This one works with the Luna demo chat template as can be found here: https://localai.io/howtos/easy-model/ After this it worked flawlessly!

I'll share my config just in case someone needs it in the future, running on an RTX3090.

context_size: 4000
name: dolphin-2.5-mixtral-8x7b.Q4_K_M.gguf
parameters:
  model: dolphin-2.5-mixtral-8x7b.Q4_K_M.gguf
  temperature: 0.2
  top_k: 80
  top_p: 0.7
template:
  chat:  chat-block
  chat_message: chat
  completion: completion
backend: llama-cpp
f16: true
gpu_layers: 28

Haven't played with any numbers yet other then the gpu_layers.