Implementation of Conformalised Quantile Regression

matteo-fontana commented 3 years ago

It would be interesting to implement Conformalised Quantile Regression (Romano et al 2019) (https://arxiv.org/abs/1905.03222). This could be done via the implementation of other Non-Conformity Measures

paolo-vergo commented 2 years ago

@ryantibs

To implement conformalised quantile regression we added a new function called conformal.quant.forest.

We chose Random Forest as quantile regression method, since, by our tests it proved to be more reliable than classical quantile regression methods in the conformal prediction context. We referred to "A comparison of some conformal quantile regression methods" by Sesia, Candés (2020) (https://arxiv.org/abs/1909.05433) for the regression algoritm. The function performs a two step procedure: firstly it computes the gamma and 1-gamma quantiles with Random Forest quantile regression, and then, according to the provided input method, it outputs a confidence interval for each test point in x0.

The function takes as input: x Matrix of features, of dimension (say) n x p. y Vector of responses, of length (say) n. x0 Matrix of features, each row being a point at which we want to form a prediction interval, of dimension (say) n0 x p. method Choose the method to compute the confidence intervals. The options are "classic" , without local scaling of the intervals, "median", using the median instead of the pointwise evaluations (y), or "scaled", with local scaling. Default is "scaled". nthreads Set the number of threads to use (for parallel computation) in the function quantregForest. Default is 8. gamma Hyperparameter which defines the quantile levels used in the method. Default is alpha/2. alpha Miscoverage level for the prediction intervals. split Indices that define the data-split to be used. seed Integer to be passed to set.seed before defining the random data-split to be used.#' @param verbose Should intermediate progress be printed out? Default is FALSE.

The function returns a list with the following components: lo, up, split. The first two are vectors of length n0 containing the bounds, while split contains the training indices.

We also built an example script, called ex.conformal.quant.forest, to test the code, similarly to the already present ex.conformal.split. Moreover we would like to point out that our function includes Roxygen headers, allowing for a faster rebuilding of the documentation and providing consistence with the existing package.

ryantibs commented 2 years ago

@paolo-vergo Sorry for the long delay. I'm generally happy/excited to add new conformity measures, especially the quantile-based ones since they seem to work so well in practice. But I'm unclear about the right way to structure it is, in the package.

Currently, all the functions conformal.pred(), conformal.pred.split(), conformal.pred.jack(), etc. all allow the training/prediction algorithm to be arbitrary, but restrict the conformity measure to be absolute residuals.

What you're proposing is to provide a function that fixes the conformity measure to be the quantile-based one, and also fixes the training/prediction algorithm to be a random forest. That seems to break with the functional style of the rest of the package, so I'd at least like to pursue some more generality. Two options are as follows.

Construct a family of functions (say) conformal.quant(), conformal.quant.split(), etc. that all use the quantile-based conformity measure, and allow the training/prediction function to be arbitrary. This just mimics what we already have.
Generalize conformal.pred() so that the conformity measure itself can specified as an input, presenting the greatest degree of generality.
- We'll have to think about what the right way to do this is. But one way would be to allow it to be a custom function that acts on the output of pred.fun. (This would allow the user to both specify the predictions they want to make, and how they score them; though it is technically superfluous to allow them to specify the prediction function, it still might be helpful conceptually.)
- We'll also have to think about what to do with the split and jackknife versions ...
- The default in all of this for the custom conformity score should be absolute residuals, so as to make this backwards-compatible; i.e., any existing code that call the conformal.pred() family of functions would continue to do the same thing even after this change to the package.
The advantage is that, under this scheme, conformal.quant(), conformal.quant.split() and so on are simply convenience functions that call the more general function under and instantiate the quantile-based conformity score.

Thoughts? We could also see what people are doing in any relevant Python implementations, to gain some perspectives from that.

paolo-vergo commented 2 years ago

Hi @ryantibs ! Since I am about to graduate and I do not have much time left, we would go for the first option. In particular we have already implemented conformal.quant() and conoformal.quant.split().

ryantibs commented 2 years ago

Sounds good @paolo-vergo. Why don't you submit a PR, but just be warned that it may end up sitting on a branch for a while.

I'm just trying to be realistic, because as you can see, I've been extremely slow in being able to find time to merge your existing PRs, which again I'm quite sorry for, and I've not been able to find anybody to help (I was seeing if I could find interested students or community members to help manage the package and tend to issues and pull requests, but haven't been successful yet).

So I don't think realistically I'll be able to get to this new PR any time soon, before you graduate. And since the more general solution that I outlined in bullet point 2 in my last message seems like the more desirable general solution, I may try to find some time to refactor the package and eventually accomplish this more general structure.

paolo-vergo commented 2 years ago

Ok @ryantibs. I'll proceed with a PR!

ryantibs / conformal

Implementation of Conformalised Quantile Regression #12