Questions about "Run Streaming Llama Chatbot"

ChuanhongLi commented 9 months ago

First of all, thanks for releasing the excellent work! I have some questions running the example you provided. I use the command:

# I have downloaded the Llama-2-7b-hf and put it to /data/model/Llama-2-7b-hf
CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py  --enable_streaming  --model_name_or_path /data/model/Llama-2-7b-hf

And I get the following results:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:41<00:00, 20.89s/it]
Loading data from data/mt_bench.jsonl ...
prompts length:  158
StartRecentKVCache: 4, 2000

USER: Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions.

ASSISTANT: seq_len:  38

### 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
该问题的推理速度：23.53038809423589 token/sec

USER: Rewrite your previous response. Start every sentence with the letter A.

ASSISTANT: seq_len:  24
USER: Rewrite your previous response. Start every sentence with the letter B.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter C.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter D.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter E.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter F.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter G.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter H.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter I.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter J.

ASSISTANT:

USER: Rewrite your previous response. Start every sentence with the letter K.

It seems that it does not work well! Anything wrong with my test? Should I change some things to get right results?

And when using the lmsys/vicuna-13b-v1.3 as the model, the results seems ok.

Thanks!

Guangxuan-Xiao commented 9 months ago

Hello, thank you for expressing interest in our work! While Llama-2-7b-hf has not been instruction tuning and isn't ideal for chatbot applications, we recommend you consider instruction-tuned models like Vicuna or Llama-2-7b-chat-hf for that purpose.

ChuanhongLi commented 9 months ago

Hello, thank you for expressing interest in our work! While Llama-2-7b-hf has not been instruction tuning and isn't ideal for chatbot applications, we recommend you consider instruction-tuned models like Vicuna or Llama-2-7b-chat-hf for that purpose.

Thank you for your reply. One more question, the figure 10 in your paper also uses instruction-tuned Llama-2-7b(Llama-2-13b)?

Guangxuan-Xiao commented 9 months ago

Figure 10 is about efficiency results. Using instruction-tuned models (*-chat) and base models has identical results.

mit-han-lab / streaming-llm

Questions about "Run Streaming Llama Chatbot" #36