huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
MIT License
462 stars 53 forks source link

Version of a task should be configurable. #172

Closed PhilipMay closed 1 month ago

PhilipMay commented 2 months ago

The LightevalTask class has a VERSION. This is not configurable and fixed to 0. See:

https://github.com/huggingface/lighteval/blob/11b48333b46ecd464cc3979de66038c87717e8d6/src/lighteval/tasks/lighteval_task.py#L166

I think in the results table the version number is also displayed. See:

Task Version Metric Value Stderr
all acc 0.8737 ± 0.0094
acc_norm 0.8737 ± 0.0094
community:german_rag_eval:_average:0 acc 0.8737 ± 0.0094
acc_norm 0.8737 ± 0.0094
community:german_rag_eval:choose_context_by_question:0 0 acc 0.7290 ± 0.0141
acc_norm 0.7290 ± 0.0141
community:german_rag_eval:choose_question_by_context:0 0 acc 0.8490 ± 0.0113
acc_norm 0.8490 ± 0.0113
community:german_rag_eval:context_question_match:0 0 acc 0.9770 ± 0.0047
acc_norm 0.9770 ± 0.0047
community:german_rag_eval:question_answer_match:0 0 acc 0.9400 ± 0.0075
acc_norm 0.9400 ± 0.0075

If I slightly change the task or the prompt (e.g. #171 ) - it would be nice to increase the version number. Should this be changed / added?

jphme commented 2 months ago

Would also appreciate this feature (and obviously someone thought of this already, given the existence of VERSION). 👍

clefourrier commented 2 months ago

Hi! Yep, we'll need to add the mechanism again - we had removed it to simplify the code, and because we did not want to fix any task version while still being in alpha.

PhilipMay commented 2 months ago

@clefourrier do you want me to suggest a PR or do you want to implement it?

clefourrier commented 2 months ago

If you have the time to open a PR I'd be grateful! You'll need to add a default version argument to the TaskConfig, and edit the task_table.jsonl to set "version" to 0 there for all tasks, and edit your own to pass it to 1.

PhilipMay commented 2 months ago

If you have the time to open a PR I'd be grateful! You'll need to add a default version argument to the TaskConfig, and edit the task_table.jsonl to set "version" to 0 there for all tasks, and edit your own to pass it to 1.

I started here: #181

PhilipMay commented 1 month ago

This is done with PR #181. Closing...