Replace NLTK with spaCy tokenization

mitre-attack / tram

Threat Report ATT&CK™ Mapping (TRAM) is a tool to aid analyst in mapping finished reports to ATT&CK.

Apache License 2.0

346 stars 66 forks source link

Replace NLTK with spaCy tokenization #27

Closed erip closed 4 years ago

erip commented 4 years ago

NLTK is slow and requires a lot of data to be downloaded in order to work even fairly well. This PR replaces NLTK with spaCy for word tokenization and sentence segmentation, updates the documentation, and the dependencies.

This is the first step in making the machine learning component faster.