Closed tangbinh closed 6 months ago
Hi! The winner is specified correctly. We do 2 games of judgment per prompt: the first game we put the baseline model first and the second game we put the baseline model after the model answer. So in the second game, "A>B" means the model answer is preferred over the baseline. Thanks!
It looks like the winners are not correctly specified in the
get_battles_from_judgment
function. Can you take a look and fix it?https://github.com/lm-sys/arena-hard/blob/5eb649883765107f42b171962683829fd064a63f/show_result.py#L160-L168