huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
462 stars 53 forks source link

Feature: Checkpointing on task level. #161

Closed PhilipMay closed 2 months ago

PhilipMay commented 2 months ago

I would like to request / suggest the following new feature:

Background

When I use cheap Azure low-priority instances or AWS spot instances they might be preempted. If this happens the evaluation must restart from the beginning.

New Feature

It would be cool to write a "checkpoint" for every task. So if multiple tasks are evaluated like with open_llm_leaderboard_tasks then it can load tasks that already have been evaluated... And we do not have to restart at the beginning.

clefourrier commented 2 months ago

Hi! This would be very hard to do, as, for efficiency purposes, we do inference for all requests of the same types in batch, and then only do metrics computations - such a system would require us to rewrite the entirety of the code base while losing overall speed performance. I don't think we will consider it - I suggest you launch evaluations on one task at a time if you have such needs.

PhilipMay commented 2 months ago

Ok. So lets close this again?