hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
https://arxiv.org/abs/2402.02057
Apache License 2.0
1.14k stars 67 forks source link

Can I get a MT-bench evaluation code for reproduction of acceleration? #36

Open je1lee opened 11 months ago

je1lee commented 11 months ago

Can I get a MT-bench evaluation code for reproduction of acceleration?

zhisbug commented 10 months ago

@Viol2000

Viol2000 commented 10 months ago

MT bench scripts uploaded. See applications/run_mtbench.sh for examples. Note that we currently support greedy search, so I set the temperature to 0

qspang commented 10 months ago

llama2-7b-10-24.json I encountered some problems when trying to evaluate mt_bench. The model used is llama2-7b-chat.l am using RTX3090 and the code used is as follows.The problem is that most of the choices recorded in the answerfile are the same, and the difference after that is to record their own questions, but there is no answer to this question! ! ! 微信图片_20240111114320 The questionfile used is the questionfile of mt_bench, part of which is shown in the picture below. 微信图片_20240111114555 Partial screenshot of the answerfile generated based on the code.Each choice has the same features as shown below 微信图片_20240111120131

微信图片_20240111120135 There was no answer at the end, but the questions were recorded in it, as shown below 微信图片_20240111120139

Viol2000 commented 10 months ago

hi @peoplekillerS , I just ran the script to reproduce your problem. However, I did not observe your situation. This is an answer file I just obtained. llama2-7b-10-24.json

I was wondering which version of code you are using and if you have modified the code, as this is an uncommon situation. The question file should be obtained by https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/run_mtbench.sh#L2, and the chatbot you may use is meta-llama/Llama-2-7b-chat-hf. Hope this can help.

qspang commented 10 months ago

Thanks to the author for the reply!!! Because I am using llama2-7b-chat-hf for testing, I am a little late in replying to you, sorry! In fact, I got the same results as you, but you can take a closer look at the json file you sent me above. If you search for 'Provide a variety of craft', you will find that basically every line has this answer. , is this a normal phenomenon? -------------------------------------------------- (Dividing line)---------------------------------------------- ---------------I just use the code of the main branch. The question of mt_bench is downloaded from the link of your run_mtbench.sh file. No code has been changed. The command is the one in the picture above. Commands, differences: 1. Use llama2-7b-chat instead of llama2-7b-chat-hf. 2. The --use-pp parameter is 1, because every time I set it to 0, An error will be reported: NotImplementedError: Cannot copy out of meta tensor; no data! (I am using RTX 3090) The above is the difference. I would like to ask the author if he would consider creating an eval_mtbench version of llama2-7b-chat?And do you know how to solve the problem that occurs when --use pp is set to 0?

Viol2000 commented 10 months ago

It is a normal phenonmenon to have the same line in every answer. We use fastchat to generate a conversation template and this line https://github.com/lm-sys/FastChat/blob/6ff8505ec80fc4b04d668f65d229f4f58bc449e0/fastchat/conversation.py#L365 is included in every prompt.

Viol2000 commented 10 months ago

Could you provide a more detailed error report you encountered when you set use-pp to 0? It may be a version mismatch. You can use the latest code and install the latest dependencies. I guess the llama2-7b-chat format is not compatible with the huggingface format. Because we use transformers lib, a model weight compatible with transformers lib (i.e., llama2-7b-chat-hf) is needed.

qspang commented 10 months ago

Thank you very much again to the author for your reply! ! ! Sorry, I got confused. I just retested llama2-7b-chat-hf. Setting the parameter --use pp to 0, it works normally! The problem should be that when applying the llama2-7b-chat model, when --use pp is set to 0, the error in the picture below will appear.image

Viol2000 commented 10 months ago

Yeah, using llama2-7b-chat-hf should be correct. llama2-7b-chat model is not compatible with transformers. I guess this is the problem.

qspang commented 10 months ago

Will you consider a version of eval_mtbench for llama2-7b-chat?

Viol2000 commented 10 months ago

Currently, I am not considering supporting llama2-7b-chat. From its website, we can find that we need to use https://github.com/facebookresearch/llama to support this model weights. While I plan to minimize the maintenance efforts to support most models and, supporting huggingface's transformers is the simplest way.