huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.94k stars 2.62k forks source link

Adding ANERcorp-CAMeLLab dataset #3242

Open vitalyshalumov opened 2 years ago

vitalyshalumov commented 2 years ago

Adding ANERcorp dataset

Adding a Dataset

In 2020, a group of researchers from CAMeL Lab (Habash, Alhafni and Oudah), and Mind Lab (Antoun and Baly) met with the creator of the corpus, Yassine Benajiba, to consult with him and collectively agree on an exact split, and accepted minor corrections from the original dataset. Bashar Alhafni from CAMeL Lab working with Nizar Habash implemented the decisions provided in this release.*

(b)Ossama Obeid, Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alexander Erdmann, and Nizar Habash. "CAMeL Tools: An Open Source Python Toolkit, for Arabic Natural Language Processing." In Proceedings of the Conference on Language Resources and Evaluation (LREC 2020), Marseille, 2020.*

Instructions to add a new dataset can be found here.