quanteda / quanteda.textmodels

Text scaling and classification models for quanteda
42 stars 6 forks source link

Adapt the LIBLINEAR code inside the package #49

Open kbenoit opened 3 years ago

kbenoit commented 3 years ago

textmodel_svmlin() is based on https://vikas.sindhwani.org/svmlin.html, which is super fast, but based on somewhat inflexibly structured code that is 15 years old. It has a lot of possibilities though including semi-supervised classification, so I've kept it for tests. (This is the C++ code you adapted from the RSSL package.)

textmodel_svm() is based on https://www.csie.ntu.edu.tw/~cjlin/liblinear/, which was updated most recently last month. It is very flexible, and adapts to both k > 2 problems as well as offering an easy way to output probabilities in prediction (using a method similar to that computed for multinomial logistic regression). This is currently taken from the LiblineaR package, although that package's version of the LIBLINEAR C++ code tends to lag behind its current version.

It would be nice to consider adapting this code to our package (or a new independent wrapper package) to do the following: