Closed KevinWu2017 closed 1 month ago
Hi, the issue appears to be due to vLLM's inability to run the Mixtral model internally, rather than an issue with OpenCompass. I suggest trying to create a minimal reproducible script that excludes OpenCompass components. Instead, write a simple Python file to run this model using vLLM and see if it can be loaded successfully.
Thank you for your reply, I created a minimal reproducible script vllm_mixtral.py
from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="mistralai/Mixtral-8x7B-v0.1", tensor_parallel_size=8, download_dir="/home/data/huggingface", gpu_memory_utilization=0.9)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
and run it with command HF_HUB_OFFLINE=1 python vllm_mixtral.py
and it successfully execute the model.
After trying more models, it appears that this issue seems to be related to tensor parallelism. When adjusting the configuration file config/models/qwen/vllm_qwen1_5_moe_a2_7b.py
and set tensor_parallel_size=2
and num_gpus=2
, the same issue occurred.
Thank you for reporting the issue. To resolve this, try modify the tensor parallel parameter in the configuration file configs/models/mistral/vllm_mixtral_8x7b_v0_1.py
to tensor_parallel_size=8
. This change may enable the model to run correctly.
After modified the configs/models/mistral/vllm_mixtral_8x7b_v0_1.py
file with tensor_parallel_size=8
and num_gpus=8
And run with command python run.py --models vllm_mixtral_8x7b_v0_1 --datasets mmlu_gen -m infer --max-num-workers 1
The log still shows the same problem.
Is there a specific environment version that can successfully run with tensor parallel? Are there any vllm, torch or opencompass version requirements?
After some searching, this should be caused by a behavior change of vLLM since vllm-0.5.1. As metioned here: https://github.com/vllm-project/vllm/pull/5669#issuecomment-2181625739.
So an easy workaround is using VLLM_WORKER_MULTIPROC_METHOD=spawn
prior to the python run.py
command.
先决条件
问题类型
我正在使用官方支持的任务/模型/数据集进行评估。
环境
重现问题 - 代码/配置示例
Just the built in
run.py
file.重现问题 - 命令或脚本
CUDA_VISIBLE_DEVICES=4,5 python run.py --models vllm_mixtral_8x7b_v0_1 --datasets mmlu_gen -m infer --max-num-workers 1 --debug
重现问题 - 错误信息
其他信息
No response