huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.13k stars 2.66k forks source link

[GEM] add WikiLingua cross-lingual abstractive summarization dataset #834

Closed yjernite closed 3 years ago

yjernite commented 3 years ago

Adding a Dataset

Instructions to add a new dataset can be found here.

KMFODA commented 3 years ago

Hey @yjernite. This is a very interesting dataset. Would love to work on adding it but I see that the link to the data is to a gdrive folder. Can I just confirm wether dlmanager can handle gdrive urls or would this have to be a manual dl?

yjernite commented 3 years ago

Hi @KMFODA ! A version of WikiLingua is actually already accessible in the GEM dataset

You can use it for example to load the French to English translation with:

from datasets import load_dataset
wikilingua = load_dataset("gem", "wiki_lingua_french_fr")

Closed by https://github.com/huggingface/datasets/pull/1807