Support for out-of-distribution detection with one class SVMs

One use case for vetiver would be monitoring whether new data that the model encounters in real world belongs to the same distribution of data seen during model training, or whether there is a covariate shift over time. I wonder if this would be a useful addition to vetiver.

We could potentially use one-class support vector machines for novelty detection, as proposed by Schölkopf et al. The idea is to define a frontier boundary surface around the training points in the p-dimensional feature space and check to see if new observations fall within or outside this frontier. Schölkopf et al suggest a single-class SVF to find the tightest hypersphere around the data. Probabilistic outputs from one-class SVMs can quantify the probability that the newly-observed data belongs to the distribution of training data.

In terms of implementation, LIBSVM provides the go-to implementation of one-class SVMs, and since version 3.31 (Feb. 2023) supports probabilistic outputs for one-class SVMs.

For vetiver-python, we could potentially use libsvm-official, which supports one-class probabilistic outputs. There is also sklearn.svm.OneClassSVM which seems to be a separate implementation inspired by libsvm, but I am not sure if it supports probabilistic outcomes.

For vetiver-r, the e1071 package provides an interface to LIBSVM but currently supports LIBSVM 3.23, which does not include probabilistic outputs. Another possibility is to use kernlab with type="one-svc", but to the best of my knowledge, kernlab does not produce probabilistic outcomes, and current predictions with one-class SVM with kernlab and tidymodels is buggy: https://github.com/tidymodels/parsnip/issues/974

What do you think @juliasilge?

rstudio / vetiver-r

Support for out-of-distribution detection with one class SVMs #222