Open bugzyz opened 1 year ago
have you tried with 1 gpu? it worked for me serving llama-2 7b-chat-hf but ran into same issue when i was trying to serve 70b on multiple gpus
have you tried with 1 gpu? it worked for me serving llama-2 7b-chat-hf but ran into same issue when i was trying to serve 70b on multiple gpus
Yes, 1 GPU is good but 2/4 GPUs will fail.
right i got similar error using vllm with multiple gpus, haven't had chance to dig deep
I realized that vllm takes more memory than the normal worker. It's much faster, but I need more gpus - ah, and they need to be in powers of two :-)
A llama-30 can run in 2 gpus of 24gb in 8 bits, and 3 gpus without the 8bit option. But needs 4 gpus for vllm!
Tag vLLM ppl for help. @WoosukKwon @zhuohan123
Hi there, I'm trying to run vllm_worker for codellama/CodeLlama-7b-Instruct-hf on 2 T4 gpus. But encountered ray.exceptions.RayActorError failure. Could you please provide any suggestion on this? Thanks!
version
command
error log