Open swihart opened 6 years ago
To do list:
Example
Find dataset and example for ridge regression (and go ahead and do lasso too). Try this, this, and this.
Reproduce the ridge regression from 1. in nlme::lme() using Rupperts code (appendix) or Jarek's (which is ruppert's) here Also see "/Volumes/swihartbj$/projects/_PostDoc/_00_varianceLRT/straightlinesSimulation_v5.R".
Reproduce ridge regression with TMB (make repo like cauchy, bridge)...etc.
Hammer out some theory, something like this and this. Remember REML (section 2.4.2) linked from here. Explicit likelihood and REML explanation from Matlab. Relevant links about laplace/normal priors in Bayesian framework: here, here, here, here. Here
Reproduce 2. with lme4::lmer because this
Fix jarek's p > n problem in 2. by cloning lme4 on github and allowing it.
Reproduce 3. using custom code and the normalp package (JSS and rdoc), setting q=2. Set q=1 to reproduce lasso from 1.
Given that 4. reproduced both lasso (q=1) and ridge (q=2) examples in 1., allow 0 < q <=2 to be estimated. Compare this to results of fixed q=1 and fixed q=2.
Profit.
Note on bullet 8 above: dnormp in r doesn't allow for q < 1.
But can we hack it...by commenting out that stop message like so -- ?
dnormp_p_lt_1 <-
function (x, mu = 0, sigmap = 1, p = 2, log = FALSE)
{
if (!is.numeric(x) || !is.numeric(mu) || !is.numeric(sigmap) ||
!is.numeric(p))
stop(" Non-numeric argument to mathematical function")
## if (p < 1)
## stop("p must be at least equal to one")
if (sigmap <= 0)
stop("sigmap must be positive")
cost <- 2 * p^(1/p) * gamma(1 + 1/p) * sigmap
expon1 <- (abs(x - mu))^p
expon2 <- p * sigmap^p
dsty <- (1/cost) * exp(-expon1/expon2)
if (log == TRUE)
dsty <- log(dsty)
dsty
}
Quick answer to the above: looks like yes, we can. Pretty cool plots. Need to make gganimate like I did for logPS. I just source the below code and then arrow through the plots in RStudio plot-tab pane.
library(normalp)
x <- c(seq(-5,-1,0.1), seq(-1,1,0.05), seq(1,5,0.1))
udens <- dunif(x, -1,1)
ndens <- dnorm(x)
ldens <- rmutil::dlaplace(x)
dnormp_p_lt_1 <- function (x, mu = 0, sigmap = 1, p = 2, log = FALSE)
{
if (!is.numeric(x) || !is.numeric(mu) || !is.numeric(sigmap) ||
!is.numeric(p))
stop(" Non-numeric argument to mathematical function")
## if (p < 1)
## stop("p must be at least equal to one")
if (sigmap <= 0)
stop("sigmap must be positive")
cost <- 2 * p^(1/p) * gamma(1 + 1/p) * sigmap
expon1 <- (abs(x - mu))^p
expon2 <- p * sigmap^p
dsty <- (1/cost) * exp(-expon1/expon2)
if (log == TRUE)
dsty <- log(dsty)
dsty
}
ylimits = c(0,1)
p.parm <- c(1e8, 20, 10, 5, 4, 3, 2, 1.75, 1.5, 1.25, 1, 0.75, 0.50, 0.25, 0.1)#seq(3,0.01,-0.01))
for(i in p.parm){
main_label <- ifelse(i > 1e6, "p=infinity is uniform", paste0("p = ", round(i,2)))
print(i)
if(i == 2) main_label <- "p=2 is normal"
if(i == 1) main_label <- "p=1 is laplace"
normp_dens <- dnormp_p_lt_1(x, p=i)
plot(x, normp_dens, main=main_label, ylim=ylimits)
lines(x, udens, col="red")
lines(x, ndens, col="blue")
lines(x, ldens, col="grey")
}
Model covariates were selected from the above candidate variables using elastic net regression applying both L1 and L2 penalties. This method provides better handling of multicollinearity than the Lasso and fewer redundancies than the Ridge methods, respectively (30).
I saved some of these to my ~/papers folder:
https://arxiv.org/pdf/1509.09169.pdf
https://www.tjmahr.com/random-effects-penalized-splines-same-thing/
https://www.biostat.umn.edu/~hodges/PubH8492/Hodges-ClaytonREONsubToStatSci.pdf
https://stats.stackexchange.com/questions/455151/mixed-model-ridge-regression-and-data-augmentation
https://www.science.smith.edu/~jcrouser/SDS293/labs/lab10-r.html
https://glmnet.stanford.edu/articles/glmnet.html#linear-regression-family-gaussian-default
https://stackoverflow.com/questions/66980922/what-is-the-recommend-function-for-ridge-regression
https://stackoverflow.com/questions/30043949/raise-to-power-in-r (C style pow in R)
https://stats.stackexchange.com/a/430198/35034 which references:
https://katbailey.github.io/post/from-both-sides-now-the-math-of-linear-regression/
https://bjlkeng.io/posts/probabilistic-interpretation-of-regularization/ (follow this to show shrinkage parameter for the power distribution)
Bayesian inference under a Laplace prior is quite challenging. Unfortunately, our best friend the Laplace approximation is intractable, since the prior is non-differentiable at zero. source: https://pillowlab.wordpress.com/2013/04/17/bayesian-inference-for-poisson-glm-with-laplace-prior/
Pretty sure |x-mu|^p is differentiable for p > 1. Which means that the quote above explains why I can't fit LASSO well with TMB, but alos means that trying to use it for(lasso) 1<p<=2 (ridge) might be possible and worth while (?) https://www.toppr.com/ask/question/show-that-the-function-fleftxrightleftxright-is-not-differentiable-at-x0/ https://www.physicsforums.com/threads/whats-the-derivative-of-the-absolute-value-to-the-power-of-p.470443/
Got into rstan rabbit hole bc they have laplace and lasso priors:
This resonated with me: https://statmodeling.stat.columbia.edu/2017/11/02/king-must-die/
https://mc-stan.org/rstanarm/reference/priors.html https://mc-stan.org/rstanarm/reference/priors.html#examples https://mc-stan.org/rstanarm/articles/lm.html
https://statmodeling.stat.columbia.edu/2017/02/14/lasso-regression-etc-stan/
actual code examples https://ccgilroy.com/csss564-labs-2019/09-review-mixtures/09-review-mixtures.html#shrinkage
##Shrinkage
##The global scales for shrinkage below come approximately from the estimates in Lab 8.
##The lasso prior isn’t quite the same as the laplace prior—it actually puts a prior on the global scale.
Prostate <- read_csv("data/Prostate.csv")
f <- lpsa ~ lcavol + lweight + age + lbph + svi + lcp + gleason + pgg45
fit_ridge <- stan_glm(f, data = Prostate,
family = gaussian(),
prior = normal(0, 0.33),
prior_intercept = normal(0, 10),
prior_aux = exponential(1))
fit_lasso <- stan_glm(f, data = Prostate,
family = gaussian(),
prior = laplace(0, 0.26),
prior_intercept = normal(0, 10),
prior_aux = exponential(1))
fit_lasso_v2 <- stan_glm(f, data = Prostate,
family = gaussian(),
prior = lasso(df = 1, 0, 2.5),
prior_intercept = normal(0, 10),
prior_aux = exponential(1))
##You can compare models of the same outcome with loo:
compare_models(loo(fit_ridge),
loo(fit_lasso))
Um ... I got into a groove and the coding just flowed. I think we did it! Estimated p=1.84. Was able to set p=1.05 hardcode to see results there. Even got rstan versions running.
Do examples from these two pages:
Ames data: https://bradleyboehmke.github.io/HOML/regularized-regression.html
Boston suburb data: http://www.sthda.com/english/articles/37-model-selection-essentials-in-r/153-penalized-regression-essentials-ridge-lasso-elastic-net
Nice explanation; graphics:
http://datacamp.com/tutorial/tutorial-ridge-lasso-elastic-net
Check pink notebook -- while $\gamma=2$ does imply $N(0, \frac{1}{\lambda})$, the $\gamma=1$ doesn't imply laplace on the Fu 1998 on the left hand side.
Park and Casella 2008: http://www.tandfonline.com/doi/abs/10.1198/016214508000000337
Chris Hans 2009: https://doi.org/10.1093/biomet/asp047
blasso
in CARET/monomvn: https://cran.r-project.org/web/packages/monomvn/monomvn.pdf https://topepo.github.io/caret/train-models-by-tag.htmlFu 1998: http://gautampendse.com/software/lasso/webpage/Fu1998.pdf
Lee 2009: https://statistics.stanford.edu/sites/default/files/2009-04.pdf
Lee 2016: http://www.sciencedirect.com/science/article/pii/S0167947316301281