Open je1lee opened 11 months ago
@Viol2000
MT bench scripts uploaded. See applications/run_mtbench.sh for examples. Note that we currently support greedy search, so I set the temperature to 0
llama2-7b-10-24.json I encountered some problems when trying to evaluate mt_bench. The model used is llama2-7b-chat.l am using RTX3090 and the code used is as follows.The problem is that most of the choices recorded in the answerfile are the same, and the difference after that is to record their own questions, but there is no answer to this question! ! ! The questionfile used is the questionfile of mt_bench, part of which is shown in the picture below. Partial screenshot of the answerfile generated based on the code.Each choice has the same features as shown below
There was no answer at the end, but the questions were recorded in it, as shown below
hi @peoplekillerS , I just ran the script to reproduce your problem. However, I did not observe your situation. This is an answer file I just obtained. llama2-7b-10-24.json
I was wondering which version of code you are using and if you have modified the code, as this is an uncommon situation. The question file should be obtained by https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/run_mtbench.sh#L2, and the chatbot you may use is meta-llama/Llama-2-7b-chat-hf. Hope this can help.
Thanks to the author for the reply!!! Because I am using llama2-7b-chat-hf for testing, I am a little late in replying to you, sorry! In fact, I got the same results as you, but you can take a closer look at the json file you sent me above. If you search for 'Provide a variety of craft', you will find that basically every line has this answer. , is this a normal phenomenon? -------------------------------------------------- (Dividing line)---------------------------------------------- ---------------I just use the code of the main branch. The question of mt_bench is downloaded from the link of your run_mtbench.sh file. No code has been changed. The command is the one in the picture above. Commands, differences: 1. Use llama2-7b-chat instead of llama2-7b-chat-hf. 2. The --use-pp parameter is 1, because every time I set it to 0, An error will be reported: NotImplementedError: Cannot copy out of meta tensor; no data! (I am using RTX 3090) The above is the difference. I would like to ask the author if he would consider creating an eval_mtbench version of llama2-7b-chat?And do you know how to solve the problem that occurs when --use pp is set to 0?
It is a normal phenonmenon to have the same line in every answer. We use fastchat to generate a conversation template and this line https://github.com/lm-sys/FastChat/blob/6ff8505ec80fc4b04d668f65d229f4f58bc449e0/fastchat/conversation.py#L365 is included in every prompt.
Could you provide a more detailed error report you encountered when you set use-pp to 0? It may be a version mismatch. You can use the latest code and install the latest dependencies. I guess the llama2-7b-chat format is not compatible with the huggingface format. Because we use transformers lib, a model weight compatible with transformers lib (i.e., llama2-7b-chat-hf) is needed.
Thank you very much again to the author for your reply! ! ! Sorry, I got confused. I just retested llama2-7b-chat-hf. Setting the parameter --use pp to 0, it works normally! The problem should be that when applying the llama2-7b-chat model, when --use pp is set to 0, the error in the picture below will appear.
Yeah, using llama2-7b-chat-hf should be correct. llama2-7b-chat model is not compatible with transformers. I guess this is the problem.
Will you consider a version of eval_mtbench for llama2-7b-chat?
Currently, I am not considering supporting llama2-7b-chat. From its website, we can find that we need to use https://github.com/facebookresearch/llama to support this model weights. While I plan to minimize the maintenance efforts to support most models and, supporting huggingface's transformers is the simplest way.
Can I get a MT-bench evaluation code for reproduction of acceleration?