koaning / tokenwiser

Bag of, not words, but tricks!
https://koaning.github.io/tokenwiser/
Apache License 2.0
68 stars 7 forks source link

SpaCy/Thinc Shim models #23

Closed koaning closed 3 years ago

koaning commented 3 years ago

I would like to add spaCy models. Should be a great way to learn the internals.

I need to think about how I want to handle featurization for all of these models.

koaning commented 3 years ago

For scikit-learn, we can use the hashing trick to generate sparse features. These can then be used by Naive Bayes, PA Classifier or SGD Classifier. These all accept sparse features and have fit_partial implemented.