vllm部署，GPU显存占用与vllm_gpu_util设置不一致

Reminder

[X] I have read the README and searched the existing issues.

System Info

CUDA_VISIBLE_DEVICES=0 API_PORT=9092 python src/api_demo.py \ --model_name_or_path /save_model/qwen1_5_7b_pcb_merge \ --template qwen \ --infer_backend vllm \ --max_new_tokens 32768 \ --vllm_maxlen 32768 \ --vllm_enforce_eager \ --vllm_gpu_util 0.95

Reproduction

推理环境： Python=3.10.14 CUDA=12.2，单张A100 80G显卡

Expected behavior

部署时没有其他GPU占用，期望的GPU占用是76G，但实际的GPU占用是55G，请问是什么原因？

Others

No response

hiyouga / LLaMA-Factory

vllm部署，GPU显存占用与vllm_gpu_util设置不一致 #4056

Reminder

System Info

Reproduction

Expected behavior

Others