turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.18k stars 233 forks source link

Problems with quantization and qwen2 inference #498

Closed nikitabalakin closed 2 weeks ago

nikitabalakin commented 2 weeks ago

Hi! When I try to quantize the Qwen2-57B-A14B-Instruct model, I get the following error, is the model not supported? E:\exllamav2>python convert.py -i D:\text-generation-webui\models\Qwen_Qwen2-57B-A14B-Instruct -o working -nr -om Qwen_Qwen2-57B-A14B-Instruct.json -ss 2048 -hb 8 !! Warning, unknown architecture: Qwen2MoeForCausalLM !! Loading as LlamaForCausalLM Traceback (most recent call last): File "E:\exllamav2\convert.py", line 71, in <module> config.prepare() File "E:\exllamav2\exllamav2\config.py", line 331, in prepare raise ValueError(f" ## Could not find {prefix}.* in model") ValueError: ## Could not find model.layers.0.mlp.down_proj.* in model

turboderp commented 2 weeks ago

Qwen2MoeForCausalLM isn't a supported architecture at the moment, as the output indicates.

nikitabalakin commented 2 weeks ago

Can you please tell me if support is planned?

turboderp commented 2 weeks ago

It's not currently planned, but it may happen. I don't know, basically. I like to give new models at least a few days before I can decide if they're worth the effort. It's potentially weeks of work (this particular MoE architecture is a big departure from the rest in some ways), and a lot could happen in the meantime.

nikitabalakin commented 2 weeks ago

Understood, thanks!