openedx-unsupported / ease

EASE (Enhanced AI Scoring Engine) is a library that allows for machine learning based classification of textual content. This is useful for tasks such as scoring student essays.
GNU Affero General Public License v3.0
216 stars 96 forks source link

Remove nltk_data #49

Closed singingwolfboy closed 10 years ago

singingwolfboy commented 10 years ago

This repository has the NTLK data set checked into the repository, which is a bad idea -- it makes the repository much larger than necessary (which makes the cloning process take longer), and it means that ease will never benefits from updates to the NLTK data set. This data should be removed from this repo, and we should add documentation for how to install it separately.

pmitros commented 10 years ago

Removing is a good idea, but do be careful about how versions are handled. Unless carefully managed, dataset changes could lead to strange differences in performance between versions/installs.