undertheseanlp / underthesea

Underthesea - Vietnamese NLP Toolkit
http://undertheseanlp.com
GNU General Public License v3.0
1.38k stars 273 forks source link

Move scikit-learn to extras_require #586

Closed BLKSerene closed 1 year ago

BLKSerene commented 1 year ago

Similar to #505, I only use sent_tokenize, word_tokenize, pos_tag, and sentiment for Vietnamese NLP tasks in my project. I have tested that these functinos do not depend on scikit-learn, which is a large dependency.

So I'm wondering that whether it is possible to also move scikit-learn to extras_require as done in #506 so that lite users of underthesea do not have to install such a large 3rd-party library?

rain1024 commented 1 year ago

Actualy, sentiment requires scikit-learn as dependency.

https://github.com/undertheseanlp/underthesea/blob/c92bca3c00e85797bf09a6d5aeaab418927a7aea/underthesea/models/text_classifier.py#L61

joblib.load loads a model, which is a sklearn.svm.LinearSVC object

I love the idea underthesea has least dependencies. Oneway to do that is we must implemnt SVM ourself, but a bigger issue come is to reinvent the wheel.