Open almersawi opened 1 month ago
Thanks for reporting this and adding the PR 🙌
We're a bit low on bandwidth but can hopefully take a look at it asap 👍
@almersawi You can continue by disabling PYTORCH_TUNABLEOP feature while starting up.
its starting up the server still without warmup.
run docker run --device /dev/kfd --device /dev/dri -e PYTORCH_TUNABLEOP_ENABLED=0 ghcr.io/huggingface/text-generation-inference:2.2.0-rocm --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0
System Info
Docker image: ghcr.io/huggingface/text-generation-inference:2.2.0-rocm Hardware: AMD MI250
Information
Tasks
Reproduction
Expected behavior
The model should be deployed as it's officially supported but I get: