Open Spurthi-Bhat-ScalersAI opened 4 months ago
I have found similar issue posted #2942
I have found a work around for the issue by enabling the enforce-eager
flag
Good to know about enforce-eager
as a workaround @Spurthi-Bhat-ScalersAI
Someone on the Discord server also pointed out that the vLLM docs state that only Mistral and Mixtral models are supported. Yet you seem to be running Llama2 like I was attempting. I'm not sure if LLama2 just happens to work or if perhaps the docs are just out of date. Regardless, thanks for following up on this with a solution!
Is the enforce-eager
flag recommended? I faced the same issue while running falcon-180B
as well. Are there any other solutions?
yes, currently, --enforce-eager is recommended. We are working on to enable hipgraph mode to improve performance, but for now, please use --enforce-eager flag. Thanks. This is also documented.
Thanks for your response @hongxiayang !
Reproducing steps:
Clone the vllm repo and switch to tag v0.3.1
Build the Dockerfile.rocm dockerfile with instructions from Option 3: Build from source with docker -Installation with ROCm
build command:
The vLLM serving command used:
Used Apache Bench for testing with 256 concurrent requests
The error below:
Issues: