lmarena / arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.
Apache License 2.0
606 stars 71 forks source link

configurable parameters #21

Closed dmitrysarov closed 4 months ago

dmitrysarov commented 5 months ago

slightly formatting configurable number_of_judgment_attempts configurable baseline_model in show_result.py

CodingWithTim commented 4 months ago

Thanks for you contribution. The code looks good to me. I like that you allow the baseline to be configured in show_result.py when generating all the battles. If we configure a model as the baseline we should also configure the model as the anchor when compute the bradley terry coefficients and win-rates. Could you add the code to configure that as well? I can merge it once you do. If not, I can do it instead.

dmitrysarov commented 4 months ago

@CodingWithTim Hope I understood you correctly. I've added this part

CodingWithTim commented 4 months ago

Thanks you got the right idea. This is great! I think one last thing is get_bootstrap_result also need to be able to support configurable baseline as well. Inside get_bootstrap_result it also calls compute_mle_elo. Could you add support for this as well? Thanks!

dmitrysarov commented 4 months ago

@CodingWithTim yeah, overlooked that part, sorry. Now it's there

CodingWithTim commented 4 months ago

@dmitrysarov Sorry about the late review, was busy with upcoming releases. This code works wonderfully for me. We really appreciate your contributions!