Closed sndsabin closed 6 years ago
@sndsabin Thank you for suggesting the list of Nepali stopwords. Do you have a reference for the stopwords so that the source of the stopword list can be properly documented?
Also, to add to the nltk_data
repo, you would need to regenerate the index.xml so that the hash for the new zipball is recorded appropriately.
E.g. :
# Create a new direcotry to avoid clashes with the actual nltk_data directory
# that the nltk code uses and also avoid clashes with old version of the nltk_data repo.
mkdir git-repos && cd git-repos
# Re-cloning the github, this might take some time.
git clone https://github.com/sndsabin/nltk_data.git
# Move the corpora subdirectory
cd /nltk_data/packages/corpora/
# Checkout the gh-pages branch
git checkout gh-pages
# Replace the stopwords.zip with the new zipball
rm stopwords.zip
cp /path/to/new/with/nepali/stopwords.zip .
# Recreate the index.xml
cd ../..
make
# Git add, commit, push.
git add packages/corpora/stopwords.zip
git add index.xml
git commit -m 'Added nepali stopwords'
git push
The 'index.xml' file was regenerated.
The Stop words for Nepali Languages was compiled from various sources and some were added manually.
@sndsabin You'll have to commit and push the regenerated index.xml
too =)
It'll be helpful if you could list the various sources so as to attribute the people who created them.
Note that committing a modified zipfile like this will clobber another recent addition to the stopwords corpus for arabic.
Is there a more authoritative source for these stopwords?
@stevenbird Yes, Madan Puraskar Pustakalaya is one .
@sndsabin - is there a URL for the stopwords data?
@stevenbird Unfortunately No. The stopwords list was compiled from various sources: research projects documentation being one
Thanks @sndsabin
welcome @stevenbird :)
The stop words for Nepali Language was added.