Integrate AraNLP.
---
https://sites.google.com/site/mahajalthobaiti/resources
AraNLP library is a Java-based toolkit for the processing of Arabic text. It
supports the most important preprocessing steps, such as diacritic and
punctuation removal, tokenization, sentence segmentation, part-of-speech
tagging, root stemming, light stemming, and word segmentation. These tools are
usually required to prepare the text for more advanced NLP tasks.
The goal of AraNLP is to gather most of the vital Arabic text preprocessing
tools into one library that can be accessed easily. Therefore, We incorporated
missing tools and included existing algorithmic resources.
AraNLP has already been used in many experiments to prepare the Arabic text and
it successfully preprocessed the corpus.
Please cite our paper in any published work using this resource:
@inproceedings{Althobaiti14AraNLP,
title={{AraNLP: a Java-Based Library for the Processing of Arabic Text}},
author={M. Althobaiti and U. Kruschwitz and M. Poesio},
booktitle={Proceedings of the 9th Language Resources and Evaluation Conference (LREC)},
year={2014},
address = {Reykjavik}
}
Original issue reported on code.google.com by richard.eckart on 17 Dec 2014 at 9:56
Original issue reported on code.google.com by
richard.eckart
on 17 Dec 2014 at 9:56