We have all-corpora and all but it'll be nice if we can several new category that includes:
popular
punkt
stopwords
wordnet
averaged_perceptron_tagger
brown
movie_reviews
words
tokenizers
punkt
snowball
perluniprops
nonbreaking_prefixes
That way I think it's easier to advise users to do the following to install nltk:
pip install -U nltk
python -m nltk.downloader popular
More importantly, I think all-no-third-party and all-third-party, so that we can separate issues when the third-party datasets/models don't update their checksum to nltk when they refresh their data/models.
@stevenbird Are the suggestions okay? How should we go about adding these categories?
We have
all-corpora
andall
but it'll be nice if we can several new category that includes:popular
tokenizers
That way I think it's easier to advise users to do the following to install
nltk
:More importantly, I think
all-no-third-party
andall-third-party
, so that we can separate issues when the third-party datasets/models don't update their checksum tonltk
when they refresh their data/models.@stevenbird Are the suggestions okay? How should we go about adding these categories?