lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Apache License 2.0
35.85k stars 4.41k forks source link

What's the version of gpt-4 of repo-provided ref answers #3302

Open UbeCc opened 2 months ago

UbeCc commented 2 months ago

Hi!

I'm using the default configuration in llm_judge repo. But when I use openai apis from different mirror, I got significantly different result. I use llama3-8b, and the score from two api-providers are 8.038760 and 6.827044.

And that result raises a question: What is the gpt-4 version at reference_answer/gpt-4.jsonl when the repo releases?

endxxxx commented 2 months ago

Same problem. I also got different scores from two api-providers on the same inference result generated by MiniCPM-2B-DPO-BF16. One of them is 7.090625, and the other is 6.025. Did you find the reason?

UbeCc commented 2 months ago

Same problem. I also got different scores from two api-providers on the same inference result generated by MiniCPM-2B-DPO-BF16. One of them is 7.090625, and the other is 6.025. Did you find the reason?

No, but I guess the different api-providers offer different versions of models. It has nothing to do with FastChat.