nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.69k stars 7.71k forks source link

Add support for Mixtral 8x7B #1747

Open flowstate247 opened 11 months ago

flowstate247 commented 11 months ago

Feature request

Add support for Mixtral 8x7B: https://mistral.ai/news/mixtral-of-experts/

Motivation

Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Your contribution

.

RandomLegend commented 11 months ago

afaik the mixtral branch of llama.cpp just updated to be able to use the new 8x7b model.

So we just have to hope for a quick llama.cpp update here i guess

bitsnaps commented 10 months ago

Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...

brankoradovanovic-mcom commented 10 months ago

Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...

Actually, SOLAR already works in GPT4All 2.5.4. Some other models don't, that's true (e.g. phi-2).

GPT4All is built on top of llama.cpp, so it is limited with what llama.cpp can work with. The list grows with time, and apparently 2.6.0 should be able to work with more architectures.

maninthemiddle01 commented 10 months ago

Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?

J35ter commented 10 months ago

Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?

Also no success with TheBloke's GGUF versions so far. trying out different versions now.

I just get a generic error message in the client. Can anyone tell me how to get more detailed error messages or are there some log files I missed?

FlorianHeigl commented 10 months ago

Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...

All true, but the moment to make a more flexible architecture is suboptimal if you time it with the introduction of a milestone feature. the only thing you get like that is a damage to the userbase. shoehorn in the feature, then refactor with the experience.

brankoradovanovic-mcom commented 10 months ago

Also no success with TheBloke's GGUF versions so far. trying out different versions now.

I just get a generic error message in the client. Can anyone tell me how to get more detailed error messages or are there some log files I missed?

If you can use the Python API, you'll get more detailed error messages. I had a situation where the chat crashed immediately upon loading the model (without even displaying the generic message), but when I tried to load the model using the Python API, I got a proper error message. It didn't help me much, though - for us the end users, any sort of error while loading the model means "this model doesn't work in GPT4All, move on". :-)

maninthemiddle01 commented 10 months ago

Unfortunately I don't know how to use the Python API, if I start the program from shell I get these error messages:

error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found llama_load_model_from_file_gpt4all: failed to load model LLAMA ERROR: failed to load model from /home/user/gpt4all/models/mixtral-8x7b-instruct-v0.1.Q3_K_M.gguf [Warning] (Mon Jan 15 xx:xx:xx 2024): ERROR: Could not load model due to invalid model file for mixtral-8x7b-instruct-v0.1.Q3_K_M.gguf id "3f67fb46-0c3b-4f89-8e6b-8d9747a4aaca"

Does this help?

cebtenzzre commented 10 months ago

Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?

Support will be added in #1819. (edit: actually, probably not completely - I think we have to implement more ops in the Vulkan backend. CPU inference should work though. Somebody should actually test it, maybe myself if I get a chance.)

brankoradovanovic-mcom commented 9 months ago

So, #1819 has been merged and it landed in 2.6.2, but yesterday I've tried phi-2 (Q5_K_M, specifically) but it still doesn't work. I suppose it doesn't have upstream support yet.

It would be nice if 2.6.2 release notes explicitly listed at least some notable models that didn't work in 2.6.1 but work now (instead of leaving most users guessing). In particular, I was surprised release notes did not mention Mixtral 8x7B, which I interpreted as "doesn't work just yet". :-)

woheller69 commented 9 months ago

for me it works. I just don't have enough ram. Will change next week :-)

cebtenzzre commented 9 months ago

So, #1819 has been merged and it landed in 2.6.2, but yesterday I've tried phi-2 (Q5_K_M, specifically) but it still doesn't work. I suppose it doesn't have upstream support yet.

It would be nice if 2.6.2 release notes explicitly listed at least some notable models that didn't work in 2.6.1 but work now (instead of leaving most users guessing). In particular, I was surprised release notes did not mention Mixtral 8x7B, which I interpreted as "doesn't work just yet". :-)

The complete answer is that we neglected to add the new models to the whitelist - fix incoming. Mixtral works (on CPU) because it claims to simply be "llama", which we support.

edit: See #1914

brankoradovanovic-mcom commented 9 months ago

Mixtral 8x7B indeed works in the chat, but it doesn't work with Python bindings - I guess that's one last bit missing for full support.

cebtenzzre commented 9 months ago

Mixtral 8x7B indeed works in the chat, but it doesn't work with Python bindings - I guess that's one last bit missing for full support.

Working on it: #1931

cebtenzzre commented 9 months ago

I just released version 2.2.0 of the python bindings, which has support for all of the latest models mentioned in #1914, including Mixtral and Phi-2 (CPU only).

alexandre-leng commented 8 months ago

Hope this model can work with GPT4ALL

cebtenzzre commented 8 months ago

Hope this model can work with GPT4ALL

It does. Just no Windows/Linux GPU support yet, which is the only reason this issue is still open.