ryantibs / conformal

Tools for conformal inference in regression
GNU General Public License v2.0
214 stars 52 forks source link

Multisplit Conformal Implementation #11

Open matteo-fontana opened 2 years ago

matteo-fontana commented 2 years ago

It would be useful to implement, as a prediction method, the Multisplit Conformal one by Solari & Djordjilović (https://arxiv.org/pdf/2103.00627.pdf)

paolo-vergo commented 2 years ago

@ryantibs

To implement multisplit conformal prediction we added a new function called conformal.pred.msplit. This function is based upon conformal.pred.split and check.args. The algorithm performs a two step procedure: first it computes B confidence intervals for each test point in a parallelized fashion (via the future.sapply function contained in the future.apply library), and then, through an helper function called interval.build, it joins the B intervals into a single confidence interval for each test point in x0.

To be more precise, the function takes as input: x Matrix of features, of dimension (say) n x p. y Vector of responses, of length (say) n. x0 Matrix of features, each row being a point at which we want to form a prediction interval, of dimension (say) n0 x p. train.fun A function to perform model training, as in the split conformal function. predict.fun A function to perform prediction for the (mean of the) responses at new feature values, as in the split conformal function. alpha Miscoverage level for the prediction intervals. rho It is a vector of split proportions of length B. Default is NULL. w Weights, in the case of covariate shift. mad.train.fun A function to perform training on the absolute residuals, as in the split conformal function mad.predict.fun A function to perform prediction for the (mean of the) absolute residuals at new feature values. split Indices that define the data-split to be used. seed Integer to be passed to set.seed before defining the random data-split to be used.#' @param verbose Should intermediate progress be printed out? Default is FALSE. B number of replications. Default is 50. lambda Smoothing parameter. Default is 0. tau It is a smoothing parameter, whose value affects the behaviour of the function joining the B intervals: tau=1-1/B Bonferroni intersection method tau=0 unadjusted intersection Default is 1-(B+1)/(2*B).

The function returns a list with the following components: lo, up. They are matrices of dimension n0 x m and coincide with the lower and upper bounds.

We also built an example script, called ex.conformal.pred.multisplit, to test the code, similarly to the already present ex.conformal.split. Moreover we would like to point out that our function includes Roxygen headers, allowing for a faster rebuilding of the documentation and providing consistence with the existing package.

ryantibs commented 2 years ago

@paolo-vergo Sorry for the long delay. This sounds great!

Please go ahead and implement it on a branch and submit a PR.

paolo-vergo commented 2 years ago

@ryantibs

I have sent you a pull request. Have a nice day!