huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
831 stars 99 forks source link

[FT] Is it possible to save the predictions to prevent rerunning expensive inference #396

Open JoelNiklaus opened 1 week ago

JoelNiklaus commented 1 week ago

Issue encountered

When evaluating large models, significant costs and delays can occur for inference, especially on larger datasets. Possibly I want to re-evaluate my predictions using different metrics.

Solution/Feature

I want the predictions to be saved in an inspectable cache which can be used when the evaluation is run again.

clefourrier commented 1 week ago

Hi, thanks for the issue!

If you use the different saving parameters (as indicated in the doc), your predictions (results and/or details) are saved and can be used for reinspection later on. The quickest way to get what you need is therefore using the details file to recompute the metrics on them by hand.

Since not all metrics use the same generation methods, we have not prioritized a cache atm (to prevent risks such as running a greedy eval, then a sampling one, and accidentally using the same results for metric computations), but we'll add your suggestion to our todo!

JoelNiklaus commented 1 day ago

Great, thanks so much!