Vicuna-13b-16k with vllm not repeats a single word in output

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Apache License 2.0

35.84k stars 4.41k forks source link

Vicuna-13b-16k with vllm not repeats a single word in output #2330

Open s-dharmam opened 10 months ago

s-dharmam commented 10 months ago

I tried to use Vicuna-13b-16k with vllm worker(feature in Fastchat library). In that case, it repeats single word in output. reproduce the error: " python3 -m fastchat.serve.vllm_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-13b-v1.5-16k --num-gpus 2"

however it works when I replace "vllm_worker" to "model_worker"

chenggb123 commented 10 months ago

me too

chenggb123 commented 10 months ago

屏幕截图 2023-09-02 183127

dhgarcia commented 10 months ago

VLLM does not support (at the moment) rope embedding, which vicuna-13b-v1.5-16k uses to extend the context to 16K see issue #2151

AdamCLarsen commented 10 months ago

here is the open issue on vLLM side https://github.com/vllm-project/vllm/issues/464