vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
28.89k stars 4.29k forks source link

[Bug]: n_inner divisible to number of GPUs #3772

Open aliozts opened 6 months ago

aliozts commented 6 months ago

Your current environment

I was using the latest docker image(0.4.0) with 4-8L4 GPUs for the mentioned problem. I also tested this with installing from source as well with a custom docker image.

🐛 Describe the bug

Hello, first of all, thank you for the grand work!

I was trying to utilize the recently supported JAIS models. When I try jais-30b-chat-v3 with 8xL4 GPUs, I was getting the error

... AssertionError: 19114 is not divisible by 8 [repeated 2x across cluster]

I wanted to test the jais-13b-chat model for the same purpose to see if I can deploy it to 4xL4 GPUs and I got

... AssertionError: 13653 is not divisible by 4 [repeated 2x across cluster]

Commands that I was utilizing can be generalized along the lines of:


MODEL=core42/jais-30b-chat-v3
NUM_GPUS=8
docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model $MODEL \
    --tensor-parallel-size $NUM_GPUS \
    --trust-remote-code \
    --gpu-memory-utilization 0.95 \
    --load-format safetensors \
    --served-model-name jais-chat

After checking the config.json files for each model, I saw that this is the n_inner parameter. I suppose it should be divisible to the number of GPUs I want to parallelize them into. May I ask if this is the intended behaviour or can I just modify the n_inner parameter to my liking for a hacky way around etc,?

grandiose-pizza commented 6 months ago

Hi, we trained it with indivisible value of n_inner due to SwiGLU layer. We will experiment with some padding and hacks and evaluate to confirm no drop in performance in due time and update you. Thanks.

vinaykyellow commented 5 months ago

Hi @grandiose-pizza were you able to experiment with padding

grandiose-pizza commented 4 months ago

Hi. Please watch out for new models to be released soon on our Huggingface. We will be fixing these.