tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
https://tatsu-lab.github.io/alpaca_eval/
Apache License 2.0
1.51k stars 240 forks source link

Possibility of Re-running Only Failed Queries After Rate Limit Reached #419

Open hank0316 opened 11 hours ago

hank0316 commented 11 hours ago

Hi,

Thanks for the brilliant evaluation framework! Recently, I encountered an issue where around 40 queries failed due to rate limits and the maximum number of retries, while approximately 760 queries were successfully processed. It would be very cost-effective if there were an option to re-run only the failed queries instead of the entire batch.

Is there a way to achieve this?

Thanks!

hank0316 commented 10 hours ago

I found a potential solution! It seems we can manually remove entries from the cache file where raw_completion is null.

I'm using weighted_alpaca_eval_gpt4_turbo as my evaluator, so the cache file is located at evaluators_configs/weighted_alpaca_eval_gpt4_turbo/annotations_seed0_configs.json.

However, I'm not entirely sure if this approach is correct. Am I on the right track?