Open flowstate247 opened 11 months ago
afaik the mixtral branch of llama.cpp just updated to be able to use the new 8x7b model.
So we just have to hope for a quick llama.cpp update here i guess
Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...
Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...
Actually, SOLAR already works in GPT4All 2.5.4. Some other models don't, that's true (e.g. phi-2).
GPT4All is built on top of llama.cpp, so it is limited with what llama.cpp can work with. The list grows with time, and apparently 2.6.0 should be able to work with more architectures.
Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?
Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?
Also no success with TheBloke's GGUF versions so far. trying out different versions now.
I just get a generic error message in the client. Can anyone tell me how to get more detailed error messages or are there some log files I missed?
Just an opinion, people will then ask to support SOLAR, then X then Y...etc. I think it's time to expend the architecture to support any future model which an expected architecture/format, starting by what's available today (GPTQ, GUFF...etc.), you'll then need to just provide the huggingface model ID or something...
All true, but the moment to make a more flexible architecture is suboptimal if you time it with the introduction of a milestone feature. the only thing you get like that is a damage to the userbase. shoehorn in the feature, then refactor with the experience.
Also no success with TheBloke's GGUF versions so far. trying out different versions now.
I just get a generic error message in the client. Can anyone tell me how to get more detailed error messages or are there some log files I missed?
If you can use the Python API, you'll get more detailed error messages. I had a situation where the chat crashed immediately upon loading the model (without even displaying the generic message), but when I tried to load the model using the Python API, I got a proper error message. It didn't help me much, though - for us the end users, any sort of error while loading the model means "this model doesn't work in GPT4All, move on". :-)
Unfortunately I don't know how to use the Python API, if I start the program from shell I get these error messages:
error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found llama_load_model_from_file_gpt4all: failed to load model LLAMA ERROR: failed to load model from /home/user/gpt4all/models/mixtral-8x7b-instruct-v0.1.Q3_K_M.gguf [Warning] (Mon Jan 15 xx:xx:xx 2024): ERROR: Could not load model due to invalid model file for mixtral-8x7b-instruct-v0.1.Q3_K_M.gguf id "3f67fb46-0c3b-4f89-8e6b-8d9747a4aaca"
Does this help?
Still Mixtral 8x7B does not work with version v2.6.1, am I doing something wrong or is it not supported yet? As I understand it, the v2.6.1 update is for the 23rd November version of llama.cpp and Mixtral will be supported with the December version?
Support will be added in #1819. (edit: actually, probably not completely - I think we have to implement more ops in the Vulkan backend. CPU inference should work though. Somebody should actually test it, maybe myself if I get a chance.)
So, #1819 has been merged and it landed in 2.6.2, but yesterday I've tried phi-2 (Q5_K_M, specifically) but it still doesn't work. I suppose it doesn't have upstream support yet.
It would be nice if 2.6.2 release notes explicitly listed at least some notable models that didn't work in 2.6.1 but work now (instead of leaving most users guessing). In particular, I was surprised release notes did not mention Mixtral 8x7B, which I interpreted as "doesn't work just yet". :-)
for me it works. I just don't have enough ram. Will change next week :-)
So, #1819 has been merged and it landed in 2.6.2, but yesterday I've tried phi-2 (Q5_K_M, specifically) but it still doesn't work. I suppose it doesn't have upstream support yet.
It would be nice if 2.6.2 release notes explicitly listed at least some notable models that didn't work in 2.6.1 but work now (instead of leaving most users guessing). In particular, I was surprised release notes did not mention Mixtral 8x7B, which I interpreted as "doesn't work just yet". :-)
The complete answer is that we neglected to add the new models to the whitelist - fix incoming. Mixtral works (on CPU) because it claims to simply be "llama", which we support.
edit: See #1914
Mixtral 8x7B indeed works in the chat, but it doesn't work with Python bindings - I guess that's one last bit missing for full support.
Mixtral 8x7B indeed works in the chat, but it doesn't work with Python bindings - I guess that's one last bit missing for full support.
Working on it: #1931
I just released version 2.2.0 of the python bindings, which has support for all of the latest models mentioned in #1914, including Mixtral and Phi-2 (CPU only).
Hope this model can work with GPT4ALL
Hope this model can work with GPT4ALL
It does. Just no Windows/Linux GPU support yet, which is the only reason this issue is still open.
Feature request
Add support for Mixtral 8x7B: https://mistral.ai/news/mixtral-of-experts/
Motivation
Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.
Your contribution
.