Open lewtun opened 4 months ago
Atm, it's not possible; however, if you run a task with many subsets (using a config file), you should get a display of the average at the task level in the score table.
If you want to get results comparable to the Open LLM Leaderboard, you'll need to use lighteval
(you can take a look at the differences between the 3 versions here).
Currently it seems that to run MMLU with the
lighteval
suite, one needs to specify all the subsets individually as is done for leaderboard task set here.Is it possible to group these together so that one can just run something like this:
Or do you recommend using one of the other suites like
helm
ororiginal
for this task?