paperswithcode / sotabench-eval

Easily evaluate machine learning models on public benchmarks
Apache License 2.0
171 stars 27 forks source link

Code for a new machine translation benchmark, Tatoeba #15

Open Traubert opened 3 years ago

Traubert commented 3 years ago

Hi, I'm proposing to integrate the Tatoeba machine translation dataset into sotabench-eval. I have included code for running the tests, modeled after WMT, and for downloading and configuring the data. I'm not 100% sure how the caching is supposed to work at the moment, I'll come back to that.

Currently you can:

import sotabencheval
from sotabencheval.machine_translation import TatoebaEvaluator, TatoebaDataset

# The test data will be downloaded and unpacked under the directory "tatoeba", this only needs to be done if the data isn't already present
sotabencheval.machine_translation.tatoeba.fetch_and_configure_data("tatoeba")
evaluator = TatoebaEvaluator(dataset=TatoebaDataset.v1, source_lang="eng", target_lang="deu", local_root="tatoeba", model_name="Some model", paper_arxiv_id="Some id")

evaluator.add({1: "Tom mag die italienische Küche.", 2: "Hier wirst du viel lernen."})
print(evaluator.get_results(ignore_missing = True))

You should be able to merge this without breaking anything, but please point me towards what else needs to be done...