Closed UserAB1236872 closed 8 years ago
I must admit I have never heard about conformal prediction before. Could you sketch out how an implementation for logistic regression would work, and why it works?
In principle there is no reason why this couldn't be implemented as an additional trait, but I want to understand how this works first (and why it is useful).
I recommend skimming the paper I linked, but Conformal Prediction is a region predictor. You can implement a conformal predictor over pretty much any classifier. Instead of returning a single label, conformal predictors are given a threshold and return a set of labels. The threshold is the P-value the label has to meet to be in the set.
For instance, a 95% (.05 threshold) conformal predictor will return a set such that there is a 95% chance that the true label is in the returned set. Obviously a "safe" conformal predictor can always trivially meet its criteria by returning every label. Generally speaking, the better the underlying classifier and model, the fewer "just to be safe" labels the returned set includes. This is useful for more safe, conservative prediction than traditional classifiers, at the cost that you don't know the true label for sure, just that it's one of the labels in the set.
I'd likely implement a Cross-Conformal predictor because it has a good accuracy/speed tradeoff, but I'll describe a traditional one for example just because it's easier:
Given some training set of points [(data1,label1),...]
and some new point data_new
, you can calculate its conformal set with logistic regression like this (pardon the half Rust half pseudocode):
let mut labels = Vec::new();
for label in {0,1} // Logistic regression only works with CP in binary cases
add (data_new,label) to training_data
let mut alphas = Vec::new()
for (data,label) in training_data
remove (data,label) from training_data
train a logistic regression classifier on training_data
let exponent = classifier.weights().dot(data);
let alpha = 1.0 + if label == 1 { exp(-exponent) } else { exp(exponent) };
alphas.push(alpha)
// Calculate p value of (data_new,label)
let alpha_n = alphas[alphas.len()-1];
let p_val = alphas.iter().fold(0.0, |acc,&alpha| if alpha >= alpha_n { acc+1.0 } else { acc } / alphas.len() as f64;
if p_val >= threshold { labels.push(label); }
labels
So in the example above alpha_n
denotes the alpha
of the newly added example? And you maintain a distribution of the alpha
scores for each class, and then if alpha_n
is an unlikely draw from that distribution you add it to a prediction set?
You seem to train the classifier in the inner loop of the algorithm above, is that necessary?
You actually have to retrain the classifier every time because the new example is part of the training set for the classifier. For every data point in your "real training set", you withold it, train the classifier, and calculate its alpha score.
Yes, this is extremely slow. There are other methods called Inductive Conformal Prediction (fast, but less accurate) and Cross-Conformal Prediction (better precision that ICP, faster than CP, overall the best) that are better. Vanilla CP was just the easiest to describe. I'd probably implement Cross-Conformal Prediction.
OK, so this is my current understanding.
For training:
alphas
).For prediction:
alpha
) of the new example (from a model trained on all the data?) wrt to a given label y
.epsilon
quantile of the calibrated error distribution, return class y
.Does that sound like a reasonable understanding? If yes, this sounds good to me! We could even make the number of folds used for calibration a hyperparameter, so that the model would reduce to inductive conformal prediction for k_folds = 1
.
Do you think this could be extended to multiclass classifiers using the one-vs-rest scheme?
(The above is based mostly on http://arxiv.org/pdf/1208.0806.pdf)
Are you still keen to add this?
Yes, sorry. I've been busy, but I'll get to work soon.
On Sun, Jan 10, 2016 at 3:30 PM, maciejkula notifications@github.com wrote:
Are you still keen to add this?
— Reply to this email directly or view it on GitHub https://github.com/maciejkula/rustlearn/issues/13#issuecomment-170405978 .
Great, looking forward to it!
Is there a reason why outputs/labels generally use Array? I got started working, and it may make more sense to just constrain outputs to be Copy+PartialEq, which would allow integral, bool etc labels as well.
I think the long-term plan is to have Array<T>
which would at the very least allow integral outputs. I'm hoping that will be accomplished by integrating https://github.com/bluss/rust-ndarray.
My grad work deals a lot with conformal prediction, and there are few, if any, open source libraries that deal with it. I've been doing some work in it on Rust and, if it's okay, I'd like to add code to get some conformal prediction implementations out in the wild.
I realize there are more pressing things to consider, and much more widely used things that you probably want to support first, but I'd like to put it out there.
Some of the library needs to be structured to deal with it, unfortunately, but it's not too intrusive. For a classifier to work with CP, it needs to be able to provide a "conformal score", which could be part of a trait. While theoretically any classifier can be used with CP, the scores are classifier-specific. Logistic Regression is fine, and just requires the weights.
LibSVM isn't sufficient to implement CP over an SVM since you need to gather the support vector coefficients from a solved QP problem, but it's not a big deal if not all classifiers we have support it.