Open sparsh35 opened 2 months ago
see https://docs.vllm.ai/en/stable/serving/distributed_serving.html#multi-node-inference-and-serving , you need to set up a ray cluster first.
getting this error @youkaichao docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. I am trying the docker route, installing with docker , earlier attempt was by handling ray server directly with placement group. I think this script is not configured for TPU. And thanks again for your help.
Docker vllm-tpu image is running ` So , i changed the config for run cluster.sh file as follows . change was in removing gpus all command in docker and adding tpus resources in ray start command as follows
`#!/bin/bash
if [ $# -lt 4 ]; then echo "Usage: $0 docker_image head_node_address --head|--worker path_to_hf_home [additional_args...]" exit 1 fi
DOCKER_IMAGE="$1" HEAD_NODE_ADDRESS="$2" NODE_TYPE="$3" # Should be --head or --worker PATH_TO_HF_HOME="$4" shift 4
ADDITIONAL_ARGS="$@"
if [ "${NODE_TYPE}" != "--head" ] && [ "${NODE_TYPE}" != "--worker" ]; then echo "Error: Node type must be --head or --worker" exit 1 fi
cleanup() { docker stop node docker rm node } trap cleanup EXIT
RAY_START_CMD="ray start --block --num-cpus=220 --resources='{\"tpu\": 4}'"
if [ "${NODE_TYPE}" == "--head" ]; then RAY_START_CMD+=" --head --port=6379" else RAY_START_CMD+=" --address=${HEAD_NODE_ADDRESS}:6379" fi
docker run \ --entrypoint /bin/bash \ --network host \ --name node \ --shm-size 10.24g \ -v "${PATH_TO_HF_HOME}:/root/.cache/huggingface" \ ${ADDITIONAL_ARGS} \ "${DOCKER_IMAGE}" -c "${RAY_START_CMD}"`
and then ray status shows 16 tpus , but 4 pipeline parallel and 4 tensor parallel wont work , I can use 16 parallel as no of attention heads for model are 28 not divisible by tensor parallel , here is ray status on docker
So , I have tried both methods in run cluster sh , with adding them in resources as well as deleting the resource file but issue persist and when trying to serve it gives this error as follows
Debugging using print shows it cant recognize the no of tpus and fails with placement group device assertion, any help would be appreciated it is urgent.
I think it may be related to the tpu environment variables used in GKE or gcloud etc for a pod, like done in
This is needed for libtpu and TPU driver to know which TPU chip is actually visible. On GKE these need to be set, otherwise the TPU driver will fail to initialize because the number of devices would be different from the number of visible worker addresses.
any ideas @youkaichao
@sparsh35 did you succeed eventually? I followed your leads and managed to run inference on a v4-16 with the following changes:
Your current environment
How would you like to use vllm
As there is an example for offline inference on TPUs, but it is not utilizing all 4 hosts in v4-32, if I run the code on all hosts , ray detects each hosts TPU resource only, Environment is correct it works for single host but maybe I dont know how to let VLLM detect and use all 4 hosts , I would like to do that for bigger models.
Before submitting a new issue...