A priori constraining number of prediction rules

The amount of the variable you get depends on the data set's size and its distribution among other things. See the original article by Friedman and Popescu (2008) and especially the remarks in section 5.

You can always use another value of the L1 penalty after the fact. E.g. something like

# fit model
library("pre")
airq <- airquality[complete.cases(airquality), ]
set.seed(42)
airq.ens <- pre(Ozone ~ ., data = airq)

# show log(lambda) (the log of the L1 penalty parameter)
plot(airq.ens$glmnet.fit)

# print model using different penalties
# ~ roughly the same; see the plot
print(airq.ens, penalty.par.val = exp(1.25))
#R> Final ensemble with lambda =  3.543968
#R>   number of terms = 12
#R>   mean cv error (se) = 352.3834 (99.13981)
#R> 
#R>   cv error type : Mean-Squared Error
#R> 
#R>          rule   coefficient                          description
#R>   (Intercept)   68.66955298                                    1
#R>       rule191  -11.17681998              Wind > 5.7 & Temp <= 87
#R>       rule173  -10.82028772              Wind > 5.7 & Temp <= 82
#R>        rule42   -8.75966239              Wind > 6.3 & Temp <= 84
#R>       rule204    7.21505549         Wind <= 10.3 & Solar.R > 148
#R>        rule10   -4.78094871              Temp <= 84 & Temp <= 77
#R>       rule192   -3.40365399  Wind > 5.7 & Temp <= 87 & Day <= 23
#R>        rule51   -2.28696480              Wind > 5.7 & Temp <= 84
#R>        rule93    2.22201352              Temp > 77 & Wind <= 8.6
#R>        rule74   -1.34529707              Wind > 6.9 & Temp <= 84
#R>        rule28   -1.19094491              Temp <= 84 & Wind > 7.4
#R>        rule25   -0.77452728              Wind > 6.3 & Temp <= 82
#R>       rule166   -0.04928187              Wind > 6.9 & Temp <= 82
print(airq.ens, penalty.par.val = "lambda.1se")
#R> Final ensemble with cv error within 1se of minimum: 
#R>   lambda =  3.543968
#R>   number of terms = 12
#R>   mean cv error (se) = 352.3834 (99.13981)
#R> 
#R>   cv error type : Mean-Squared Error
#R> 
#R>          rule   coefficient                          description
#R>   (Intercept)   68.48270406                                    1
#R>       rule191  -10.97368179              Wind > 5.7 & Temp <= 87
#R>       rule173  -10.90385520              Wind > 5.7 & Temp <= 82
#R>        rule42   -8.79715538              Wind > 6.3 & Temp <= 84
#R>       rule204    7.16114780         Wind <= 10.3 & Solar.R > 148
#R>        rule10   -4.68646144              Temp <= 84 & Temp <= 77
#R>       rule192   -3.34460037  Wind > 5.7 & Temp <= 87 & Day <= 23
#R>        rule51   -2.27864287              Wind > 5.7 & Temp <= 84
#R>        rule93    2.18465676              Temp > 77 & Wind <= 8.6
#R>        rule74   -1.36479546              Wind > 6.9 & Temp <= 84
#R>        rule28   -1.15326093              Temp <= 84 & Wind > 7.4
#R>        rule25   -0.70818399              Wind > 6.3 & Temp <= 82
#R>       rule166   -0.04751152              Wind > 6.9 & Temp <= 82

# larger penalty: fewer coef
print(airq.ens, penalty.par.val = exp(2))
#R> Final ensemble with lambda =  7.459713
#R>   number of terms = 5
#R>   mean cv error (se) = 652.2982 (143.0936)
#R> 
#R>   cv error type : Mean-Squared Error
#R> 
#R>          rule  coefficient                   description
#R>   (Intercept)   57.0512331                             1
#R>       rule173  -11.8608800       Wind > 5.7 & Temp <= 82
#R>        rule42   -9.5918711       Wind > 6.3 & Temp <= 84
#R>       rule204    1.9628007  Wind <= 10.3 & Solar.R > 148
#R>        rule51   -0.9954876       Wind > 5.7 & Temp <= 84
#R>        rule74   -0.7710814       Wind > 6.9 & Temp <= 84

# predict using different penalty values
xdat <- airq[1L, ]
predict(airq.ens, newdata = xdat, penalty.par.val = exp(1.25))
#R> [1] 32.48716
predict(airq.ens, newdata = xdat, penalty.par.val = "lambda.1se")
#R> [1] 32.53896
predict(airq.ens, newdata = xdat, penalty.par.val = exp(2))
#R> [1] 35.79471

Do notice that

the plot overestimates the performance. A double bootstrap/cross validation should likely be used to get valid estimates but this is computationally more expensive.
using a way larger penalty may give a worse accuracy.

Do you find it legitimate a priori do define how many rules one wants to return?

Depends on the data set you are using and your application. You are likely not using this method because you want to get the highest possible accuracy so "sacrificing" a bit more for interpretability may be ok.

If so, is there a way to program this into the pre function?

See the example above.

marjoleinF / pre

A priori constraining number of prediction rules #23