huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
MIT License
814 stars 98 forks source link

[FT] Add Gemba MQM Translation Metric #397

Open JoelNiklaus opened 3 days ago

JoelNiklaus commented 3 days ago

Issue encountered

The metrics only include rather outdated translation metrics.

Solution/Feature

Gemba MQM seems to be a current metric. Adding it would make translation evaluation better.

NathanHB commented 3 days ago

Looks nice ! This is using external APIs and does not seem to have a PyPI package so we would need to implement it in Lighteval. This not not high priority but if you need it feel free to open a PR and we can help you set it up :)

JoelNiklaus commented 14 hours ago

Great, thanks! Yes, I see two avenues:

  1. Fork their repo, publish a pip package and integrate it like that.
  2. Just copy their prompts and post processing functions.

IMO option 1 is cleaner and also allows other people to use the metric more easily.

@chuandudx Would you be interested in taking this?