openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.76k stars 2.58k forks source link

Logged spec now includes overridden args #1460

Closed ojaffe closed 8 months ago

ojaffe commented 8 months ago

Using --extra_eval_params will override args of the same name specified in the eval .yaml, but the updated values will not be logged in the spec, the original values will be logged instead. This PR fixes this problem; we just update eval_spec.args with the new values.

e.g. running oaieval dummy make-me-pay --extra_eval_params turn_cap=1 previously lead to "turn_cap": 5 being logged in the spec, since this is the default value. In this branch, running the same command leads to "turn_cap": 1 being logged in the spec.