Open vgod-dbx opened 6 months ago
Hi @vgod-dbx, please try again with the Dockerfile.rocm here: EDIT: The Dockerfile.rocm from top of tree should now work!
The change made is to install a ROCm fork of Triton. It also contains the numba upgrade we discussed in the other thread.
I've tested a Docker generated using the above Dockerfile on 4x MI250X using the config you specified and it appears to be working fine.
@mawong-amd I can confirm the new container worked! Thanks for the swift response!
@mawong-amd I can confirm the new container worked! Thanks for the swift response!
I failed in MI250x , is it possible for you to share your image?
I failed in MI250x , is it possible for you to share your image?
I failed in MI250x , is it possible for you to share your image?
Your current environment
db2a6a41e206abecf4128aba25117fcaf7bebe12
) + ROCm 6.0 DockerĀ image built with the fix of Dockerfile.rocmš Describe the bug
Ran vllm Docker image with
docker run --network=host --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 32G --device /dev/kfd --device /dev/dri -v $model_dir:/app/model vllm-rocm:v0.4.0.post1 python -m vllm.entrypoints.openai.api_server --port 7860 --model /app/model/models--databricks--dbrx-instruct/snapshots/17365204e9cf13e2296ee984c1ab48071e861efa --trust-remote-code --tensor-parallel-size 8
The vllm server crashed soon after loading the model.