huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
689 stars 78 forks source link

Add maj@k metric #158

Closed clefourrier closed 5 months ago

clefourrier commented 5 months ago

Still missing:

clefourrier commented 5 months ago

I'll check it - however, please don't merge yet, I need to propagate the changes to the other lauchers + simplify greedy :)

NathanHB commented 5 months ago

Oh mb I thought it ready for review !

clefourrier commented 5 months ago

I hope I'll finish it today :) It'll be a bit bigger because I'm simplifying part of the current system

clefourrier commented 5 months ago

Tests failing because of the hub problems ^^"""

NathanHB commented 5 months ago

Is this good to be merged ? I saw on slack you did not get the same results for mistral models. (do we even know how they ran their tests ?)

clefourrier commented 5 months ago

I'll do more tests today - we could merge it now and adjust later if needed though.

NathanHB commented 5 months ago

Alright i'm merging it so that we can merge the other PRs