huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.24k stars 2.69k forks source link

OpenSLR 25: ASR data for Amharic, Swahili and Wolof #2980

Open cdleong opened 3 years ago

cdleong commented 3 years ago

Adding a Dataset

https://github.com/huggingface/datasets/blob/master/datasets/openslr/openslr.py already has been created for various other OpenSLR subsets, this should be relatively straightforward to do.

cdleong commented 3 years ago

Whoever handles this just needs to:

cdleong commented 3 years ago

... also the example in "use in datasets library" should be updated. It currently says image But you actually have to specify a subset, e.g.

dataset = load_dataset("openslr", "SLR32")
cdleong commented 3 years ago

image