onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
546 stars 99 forks source link

Using vocabulary in CountVectorizer #644

Open sihongliu66 opened 3 years ago

sihongliu66 commented 3 years ago

I received an error messge of "CountVectorizer' object has no attribute 'stopwords" when using CountVectorizer with vocabulary. From sklearn tutorial the stop_words attrubute will not be available when vocabulary defined. Not sure if this is something that could be supported?

xadupre commented 3 years ago

onnxruntime does not currently support stop_word. One option could be to implement a custom operator in https://github.com/microsoft/onnxruntime-extensions and then modifies the converter to use it.

WilliamTambellini commented 2 years ago

cf https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html