Open tutu329 opened 6 months ago
Hello, I got the same "KeyError" but "KeyError: 'layers.0.attention.wk.weight'" when I tried to run Mixtral-8x22B.
Hello, I got the same "KeyError" but "KeyError: 'layers.0.attention.wk.weight'" when I tried to run Mixtral-8x22B.
I can infer mixtral-8x22b-instruct-awq through vllm0.3.3 or 0.4. but some slow.
Your current environment
🐛 Describe the bug
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m vllm.entrypoints.openai.api_server --served-model-name=8x22b --model=/home/jarrelscy/Mixtral-8x22B-Instruct-v0.1-GPTQ-4bit --gpu-memory-utilizatio=0.95 --max-model-len=60000 --max-num-seqs=2 --tensor-parallel-size=8 --trust-remote-code --host=0.0.0.0 --port=8001 --max-log-len=1000
KeyError: 'model.layers.45.block_sparse_moe.gate.g_idx'