Open ArtificialEU opened 1 month ago
you can follow https://docs.vllm.ai/en/latest/getting_started/debugging.html to get more information on what's going on.
@youkaichao
i also get this error on Tesla T4 run model gemma-2-2b-it
INFO 08-12 14:54:48 selector.py:79] Using Flashinfer backend.
WARNING 08-12 14:54:48 selector.py:80] Flashinfer will be stuck on llama-2-7b, please avoid using Flashinfer as the backend when running on llama-2-7b.
INFO 08-12 14:54:49 selector.py:79] Using Flashinfer backend.
WARNING 08-12 14:54:49 selector.py:80] Flashinfer will be stuck on llama-2-7b, please avoid using Flashinfer as the backend when running on llama-2-7b.
hangs after above logs
Did you solve the problem? I'm experiencing the same problem
I encountered the same problem when I loading gemma-2, my GPU is V100
I had the same issue, and for me the problem was in the memory
. If you are using a compute cluster, make sure you are also allocating enough physical memory, not only GPU-memory.
Thanks, you are correct. I reviewed my deployment environment, I found I allocate not enough memory.
Your current environment
🐛 Describe the bug
On the Tesla T4 the model "hangs" after loading the model (the vram usage spikes normally and stays constant) but nothings comes after
Setup
` containers:
Log
On my 4090 here is the expected trace
I don't understand why the vram usage is nowhere close it's a simple model, any help is appreciated