Open s-dharmam opened 10 months ago
me too
VLLM does not support (at the moment) rope embedding, which vicuna-13b-v1.5-16k uses to extend the context to 16K see issue #2151
here is the open issue on vLLM side https://github.com/vllm-project/vllm/issues/464
I tried to use Vicuna-13b-16k with vllm worker(feature in Fastchat library). In that case, it repeats single word in output. reproduce the error: " python3 -m fastchat.serve.vllm_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-13b-v1.5-16k --num-gpus 2"
however it works when I replace "vllm_worker" to "model_worker"