huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
845 stars 100 forks source link

Support for multilingual generative metrics #293

Closed hynky1999 closed 2 months ago

hynky1999 commented 2 months ago

What does this implement/fix? Explain your changes.

  1. This PR adds two "new" metrics for generative evaluation

  2. It adds Sentence/Word tokenizers to the library. I literally copied them from datatrove as we added support there in near past. I don't want to depend on datatrove thus decided to do this. It's possibly that the definitions will fork, but that's ok and we can sync after some time.

  3. Similar to tokenizers I also took language definitions from datatrove

  4. Minor typing upgrades

hynky1999 commented 2 months ago

Great ! Do you have a task example where this is used ?

Yes, we are using this metric for multilingual generative tasks, I plan to add them once this PR is merged in a bulk

hynky1999 commented 2 months ago

Don't we expect the model tokenizer and evaluation tokenizer to behave similarly though? No, cause as said the tokenizers have different purpose