swihart / gnlrim

Generalized Nonlinear Random Intercept Models in R aka a freestanding `repeated::gnlmix()`
0 stars 0 forks source link

Bayesian Lasso #13

Open swihart opened 6 years ago

swihart commented 6 years ago

Park and Casella 2008: http://www.tandfonline.com/doi/abs/10.1198/016214508000000337

Chris Hans 2009: https://doi.org/10.1093/biomet/asp047

blasso in CARET/monomvn: https://cran.r-project.org/web/packages/monomvn/monomvn.pdf https://topepo.github.io/caret/train-models-by-tag.html

Fu 1998: http://gautampendse.com/software/lasso/webpage/Fu1998.pdf

Lee 2009: https://statistics.stanford.edu/sites/default/files/2009-04.pdf

Lee 2016: http://www.sciencedirect.com/science/article/pii/S0167947316301281

swihart commented 6 years ago

To do list:

Example

  1. Find dataset and example for ridge regression (and go ahead and do lasso too). Try this, this, and this.

  2. Reproduce the ridge regression from 1. in nlme::lme() using Rupperts code (appendix) or Jarek's (which is ruppert's) here Also see "/Volumes/swihartbj$/projects/_PostDoc/_00_varianceLRT/straightlinesSimulation_v5.R".

  3. Reproduce ridge regression with TMB (make repo like cauchy, bridge)...etc.

  4. Hammer out some theory, something like this and this. Remember REML (section 2.4.2) linked from here. Explicit likelihood and REML explanation from Matlab. Relevant links about laplace/normal priors in Bayesian framework: here, here, here, here. Here

  5. Reproduce 2. with lme4::lmer because this

  6. Fix jarek's p > n problem in 2. by cloning lme4 on github and allowing it.

  7. Reproduce 3. using custom code and the normalp package (JSS and rdoc), setting q=2. Set q=1 to reproduce lasso from 1.

  8. Given that 4. reproduced both lasso (q=1) and ridge (q=2) examples in 1., allow 0 < q <=2 to be estimated. Compare this to results of fixed q=1 and fixed q=2.

  9. Profit.

swihart commented 5 years ago

Note on bullet 8 above: dnormp in r doesn't allow for q < 1.

But can we hack it...by commenting out that stop message like so -- ?

dnormp_p_lt_1 <- 
function (x, mu = 0, sigmap = 1, p = 2, log = FALSE) 
{
    if (!is.numeric(x) || !is.numeric(mu) || !is.numeric(sigmap) || 
        !is.numeric(p)) 
        stop(" Non-numeric argument to mathematical function")
 ##   if (p < 1) 
 ##   stop("p must be at least equal to one")
    if (sigmap <= 0) 
        stop("sigmap must be positive")
    cost <- 2 * p^(1/p) * gamma(1 + 1/p) * sigmap
    expon1 <- (abs(x - mu))^p
    expon2 <- p * sigmap^p
    dsty <- (1/cost) * exp(-expon1/expon2)
    if (log == TRUE) 
        dsty <- log(dsty)
    dsty
}
swihart commented 5 years ago

Quick answer to the above: looks like yes, we can. Pretty cool plots. Need to make gganimate like I did for logPS. I just source the below code and then arrow through the plots in RStudio plot-tab pane.

library(normalp)
x <- c(seq(-5,-1,0.1), seq(-1,1,0.05), seq(1,5,0.1))

udens <- dunif(x, -1,1)
ndens <- dnorm(x)
ldens <- rmutil::dlaplace(x)

dnormp_p_lt_1 <- function (x, mu = 0, sigmap = 1, p = 2, log = FALSE) 
{
  if (!is.numeric(x) || !is.numeric(mu) || !is.numeric(sigmap) || 
      !is.numeric(p)) 
    stop(" Non-numeric argument to mathematical function")
  ##   if (p < 1) 
  ##   stop("p must be at least equal to one")
  if (sigmap <= 0) 
    stop("sigmap must be positive")
  cost <- 2 * p^(1/p) * gamma(1 + 1/p) * sigmap
  expon1 <- (abs(x - mu))^p
  expon2 <- p * sigmap^p
  dsty <- (1/cost) * exp(-expon1/expon2)
  if (log == TRUE) 
    dsty <- log(dsty)
  dsty
}

ylimits = c(0,1)

p.parm <- c(1e8, 20, 10, 5, 4, 3, 2, 1.75, 1.5, 1.25, 1, 0.75, 0.50, 0.25, 0.1)#seq(3,0.01,-0.01))

for(i in p.parm){

main_label <- ifelse(i > 1e6, "p=infinity is uniform", paste0("p = ", round(i,2)))
print(i)
if(i == 2) main_label <- "p=2 is normal"
if(i == 1) main_label <- "p=1 is laplace"

normp_dens <- dnormp_p_lt_1(x, p=i)
plot(x, normp_dens, main=main_label, ylim=ylimits)
lines(x, udens, col="red")
lines(x, ndens, col="blue")
lines(x, ldens, col="grey")

}
swihart commented 5 years ago
swihart commented 5 years ago
swihart commented 8 months ago

Model covariates were selected from the above candidate variables using elastic net regression applying both L1 and L2 penalties. This method provides better handling of multicollinearity than the Lasso and fewer redundancies than the Ridge methods, respectively (30).

  1. Zou H, Hastie T. Regularization and Variable Selection Via the Elastic Net, Journal of the Royal Statistical Society Series B: Statistical Methodology, Volume 67, Issue 2, April 2005, Pages 301–320,. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301-20.
swihart commented 7 months ago

I saved some of these to my ~/papers folder:

https://arxiv.org/pdf/1509.09169.pdf

https://www.tjmahr.com/random-effects-penalized-splines-same-thing/

https://www.biostat.umn.edu/~hodges/PubH8492/Hodges-ClaytonREONsubToStatSci.pdf

https://stackoverflow.com/questions/50046342/specified-covariance-matrix-in-lmer-in-r-for-penalized-splines

https://stats.stackexchange.com/questions/455151/mixed-model-ridge-regression-and-data-augmentation

https://stats.stackexchange.com/questions/15991/how-to-apply-a-soft-coefficient-constraint-to-an-ols-regression

https://www.science.smith.edu/~jcrouser/SDS293/labs/lab10-r.html

https://glmnet.stanford.edu/articles/glmnet.html#linear-regression-family-gaussian-default

https://stackoverflow.com/questions/66980922/what-is-the-recommend-function-for-ridge-regression

https://stackoverflow.com/questions/30043949/raise-to-power-in-r (C style pow in R)

https://stats.stackexchange.com/a/430198/35034 which references:

Bayesian inference under a Laplace prior is quite challenging. Unfortunately, our best friend the Laplace approximation is intractable, since the prior is non-differentiable at zero. source: https://pillowlab.wordpress.com/2013/04/17/bayesian-inference-for-poisson-glm-with-laplace-prior/

Pretty sure |x-mu|^p is differentiable for p > 1. Which means that the quote above explains why I can't fit LASSO well with TMB, but alos means that trying to use it for(lasso) 1<p<=2 (ridge) might be possible and worth while (?) https://www.toppr.com/ask/question/show-that-the-function-fleftxrightleftxright-is-not-differentiable-at-x0/ https://www.physicsforums.com/threads/whats-the-derivative-of-the-absolute-value-to-the-power-of-p.470443/

Got into rstan rabbit hole bc they have laplace and lasso priors:

This resonated with me: https://statmodeling.stat.columbia.edu/2017/11/02/king-must-die/

https://mc-stan.org/rstanarm/reference/priors.html https://mc-stan.org/rstanarm/reference/priors.html#examples https://mc-stan.org/rstanarm/articles/lm.html

https://statmodeling.stat.columbia.edu/2017/02/14/lasso-regression-etc-stan/

actual code examples https://ccgilroy.com/csss564-labs-2019/09-review-mixtures/09-review-mixtures.html#shrinkage

##Shrinkage

##The global scales for shrinkage below come approximately from the estimates in Lab 8.

##The lasso prior isn’t quite the same as the laplace prior—it actually puts a prior on the global scale.

Prostate <- read_csv("data/Prostate.csv")

f <- lpsa ~ lcavol + lweight + age + lbph + svi + lcp + gleason + pgg45 

fit_ridge <- stan_glm(f, data = Prostate, 
                      family = gaussian(), 
                      prior = normal(0, 0.33), 
                      prior_intercept = normal(0, 10), 
                      prior_aux = exponential(1))

fit_lasso <- stan_glm(f, data = Prostate, 
                      family = gaussian(), 
                      prior = laplace(0, 0.26), 
                      prior_intercept = normal(0, 10), 
                      prior_aux = exponential(1))

fit_lasso_v2 <- stan_glm(f, data = Prostate, 
                         family = gaussian(), 
                         prior = lasso(df = 1, 0, 2.5), 
                         prior_intercept = normal(0, 10), 
                         prior_aux = exponential(1))

##You can compare models of the same outcome with loo:

compare_models(loo(fit_ridge), 
               loo(fit_lasso))
swihart commented 7 months ago

Um ... I got into a groove and the coding just flowed. I think we did it! Estimated p=1.84. Was able to set p=1.05 hardcode to see results there. Even got rstan versions running.

swihart commented 7 months ago

Do examples from these two pages:

Ames data: https://bradleyboehmke.github.io/HOML/regularized-regression.html

Boston suburb data: http://www.sthda.com/english/articles/37-model-selection-essentials-in-r/153-penalized-regression-essentials-ridge-lasso-elastic-net

swihart commented 7 months ago

Nice explanation; graphics:

http://datacamp.com/tutorial/tutorial-ridge-lasso-elastic-net

swihart commented 7 months ago
Screenshot 2023-12-04 at 11 01 28 AM

Check pink notebook -- while $\gamma=2$ does imply $N(0, \frac{1}{\lambda})$, the $\gamma=1$ doesn't imply laplace on the Fu 1998 on the left hand side.