langtech-bsc / mlops-lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
1 stars 0 forks source link

[Feature] Migrate tasks from La Leaderboard to this Harness #16

Open juliafalcao opened 1 week ago

juliafalcao commented 1 week ago

We are currently running evaluation for La Leaderboard in a version of Harness cloned from SomosNLP's fork, because it already had some tasks we needed. The directory is in GPFS at /gpfs/projects/bsc88/evaluation/leaderboard_eval/lm-evaluation-harness. We need to implement these tasks in this version of Harness and get rid of that clone to only use this one.

jab13x commented 3 days ago

I am waiting for Maria of SomosNLP to add the few-shot numbers we agreed. Will change priority once that happens.