NLTK is slow and requires a lot of data to be downloaded in order to work even fairly well. This PR replaces NLTK with spaCy for word tokenization and sentence segmentation, updates the documentation, and the dependencies.
This is the first step in making the machine learning component faster.
NLTK is slow and requires a lot of data to be downloaded in order to work even fairly well. This PR replaces NLTK with spaCy for word tokenization and sentence segmentation, updates the documentation, and the dependencies.
This is the first step in making the machine learning component faster.