Open weiminw opened 4 months ago
GPTQ is not yet supported for Qwen MoE. We are working on it
GPTQ is not yet supported for Qwen MoE. We are working on it
so, what kind of quantization MOE model of qwen can vllm support in 0.5.2? could you recommend me the quantization MOE of qwen2 model ?
We currently support fp16 and fp8 for qwen MoE
fp8 requires hopper GPUs
This PR may solve it https://github.com/vllm-project/vllm/pull/6502 And I created a WHL for self testing using this branch: https://github.com/akai-shuuichi/vllm/releases/download/v5/vllm-0.5.2-cp310-cp310-manylinux1_x86_64.whl
GPUs
@akai-shuuichi DeepSeek V2 Support?
GPUs
@akai-shuuichi DeepSeek V2 Support? I only tested Qwen, I don't have enough GPU to run deepseekV2 moe :(
GPUs
@akai-shuuichi DeepSeek V2 Support? I only tested Qwen, I don't have enough GPU to run deepseekV2 moe :( Thank you. You can also try this model
GPUs
@akai-shuuichi DeepSeek V2 Support? I only tested Qwen, I don't have enough GPU to run deepseekV2 moe :( Thank you. You can also try this model
sorry, model has error:
Traceback (most recent call last): File "/vllm-workspace/v1Server.py", line 50, in <module> generation_config, tokenizer, stop_word, engine = load_vllm() File "/vllm-workspace/v1Server.py", line 23, in load_vllm generation_config = GenerationConfig.from_pretrained(model_dir, trust_remote_code=True) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py", line 915, in from_pretrained resolved_config_file = cached_file( File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 373, in cached_file raise EnvironmentError( OSError: /vllm-workspace/DeepSeek-V2-Lite-gptq-4bit does not appear to have a file named generation_config.json.
@akai-shuuichi Hi, When I reason about the qwen-moe-gptq-int4 model, it always prompts triton.runtime.errors.OutOfResources: out of resource: shared memory,
Error, how to solve it
@akai-shuuichi Hi, When I reason about the qwen-moe-gptq-int4 model, it always prompts
triton.runtime.errors.OutOfResources: out of resource: shared memory,
Error, how to solve it
hi,I also meet this error,do you solve it now?
This PR https://github.com/vllm-project/vllm/pull/8973/files should have fixed your issue, but it is still not sufficient to run the quantized DeepSeek V2 model. I got other errors when try DeepSeek V2 AWQ-int4 with lateset vllm.
Your current environment
🐛 Describe the bug
got the following error: