Closed narayanacharya6 closed 3 years ago
I think the fastest way to achieve what I want would be to create a subclass of SklearnCat
and change the update
method to call partial_fit
with the features from my previous pipeline component instead of the texts.
If there is a better way to do this, please do let me know :)
Just to confirm, you've seen these featurizers? Also, what kind of features from the Doc would you want to use that might contribute to a better classification?
I have some very trivial features at the document level which have worked well for the task at hand. If I understand correctly the featurizers you mention incorporate token level information.
I actually did end up sub-classing the SklearnCat
and overriding the methods responsible for the partial_fit
, predict
, etc. I ran into some other issues, but I guess those are more on spaCy training behaviour than tokenwiser :)
Closing the issue.
Hey, the library looks promising and I completely agree with the motivation behind it!
I have a question based on your blog post about using custom sklearn models as part of spaCy pipeline here. The example in the blog suggests using the
HashingVectorizer
from sklearn directly. I wanted to swap that out and use custom features I extract from theDoc
from a previous pipeline component as an input to myPartialPipeline
that hosts only the classifier. So, the entire pipeline would something like thistokenizer >> custom_featurizer (sets some extension on the doc indicating features) >> partial pipeline (has only the classifier that uses features from the previous component)
The above question does not look well worded, so I'd be happy to add more color to the question if it does not make sense.