linuxscout / arabicstopwords

Arabic Stop Word List
Other
32 stars 9 forks source link
arabic-nlp language nlp

Arabic Stop words

Arabic Stop words logo

PyPI - Downloads

Developers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail dot com

Features value
Authors Authors.md
Release 0.9
License GPL
Tracker linuxscout/arabicstopwords/Issues
Source Github
Website ArabicStopwords on SourceForge
Doc package Documentation
Download Python Library
Download Data set CSV/SQL/Python
Feedbacks Comments
Accounts @Twitter)
Citation T. Zerrouki‏, Arabic Stop Words

Description

It's not easy to detemine the stop words, and in other hand, stop words differs according to the case, for this purpos, we propose a classified list which can be parametered by developper.

The Word list contains only words in its common forms, and we have generated all forms by a script.

It can used as library 'see section arabicstopwords library'

Files

Data

This project contains two parts:

Data Structure

Two fromats of data are given:

Stopwords Example

Minimal classified data .ODS/CSV file

Affixation infomration in other fields:

All forms data CSV file

word    vocalized   type    category    original    procletic   stem    encletic    tags
بأنك    بِأَنّكَ    حرف إن و أخواتها    أن  ب-      -ك  جر:مضاف
بأنكما  بِأَنّكُمَا حرف إن و أخواتها    أن  ب-      -كما    جر:مضاف

How to customize stop word list

How to update data

Arabic Stop words Library

Install

pip install arabicstopwords

Usage

* list all stop words

stp.stopwords_list() ...... len(stp.stopwords_list()) 13629 len(stp.classed_stopwords_list()) 507

* give all forms of a stopword
```python
stp.stopword_forms(u"على")
....
len(stp.stopword_forms(u"على"))
144

Citation

If you would cite it in academic work, can you use this citation

T. Zerrouki‏, Arabic Stop Words,  https://github.com/linuxscout/arabicstopwords/, 2010

Another Citation:

Zerrouki, Taha. "Towards An Open Platform For Arabic Language Processing." (2020).

or in bibtex format

@misc{zerrouki2010arabicstopwords,
  title={Arabic Stop Words},
  author={Zerrouki, Taha},
  url={https://github.com/linuxscout/arabicstopwords},
  year={2010}
}
@thesis{zerrouki2020towards,
  title={Towards An Open Platform For Arabic Language Processing},
  author={Zerrouki, Taha},
  year={2020}
}