Open hist0613 opened 3 months ago
Arena is currently providing the ELO rating only. For a better visibility, it would be good to see the results in Figure 3 of the LMSYS Chatbot Arena Leaderboard.
We could provide the multi-agent evaluation results in this format. (which enables us to compare AI evaluation with human evaluation)
Arena is currently providing the ELO rating only. For a better visibility, it would be good to see the results in Figure 3 of the LMSYS Chatbot Arena Leaderboard.