stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
109 stars 25 forks source link

Poisson example in GLMM vignette #137

Closed fweber144 closed 2 years ago

fweber144 commented 3 years ago

Even after #133, the Poisson example in the GLMM vignette stays strange: For the hierarchical SD of (1 | obs), the projection gives exactly 1 across all projected draws. When looking closer at the submodel from the first projected draw, I got this output:

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: poisson  ( log )
Formula: .phen_pois.1 ~ cofactor + (1 | obs)
   Data: data
Weights: weights
     AIC      BIC   logLik deviance df.resid 
     Inf      Inf     -Inf      Inf      197 
Random effects:
 Groups Name        Std.Dev.
 obs    (Intercept) 1       
Number of obs: 200, groups:  obs, 200
Fixed Effects:
(Intercept)     cofactor  
    -1.9833       0.2467  
optimizer (Nelder_Mead) convergence code: 0 (OK) ; 14000 optimizer warnings; 1 lme4 warnings

The optimizer and lme4 warnings there at the end might already indicate an issue. Perhaps it's the try to model overdispersion by including (1 | obs) (where obs has a one level per observation) which causes problems here. (Note that in the currently published vignette, it seems like the hierarchical SD is not exactly 1, but that's only due to the former use of the empirical SD of the random effects instead of their hierarchical SD, see #74.)

Anyway, since the vignette already takes very long to run through, perhaps simply omit this Poisson example? Or completely re-organize the whole GLMM vignette? In my opinion, a single example would be enough. And since the Bernoulli family is probably the most common one for generalized linear (mixed) models, I would choose a Bernoulli example. The VerbAgg example from the current vignette is nice, but it has a lot of observations (even when restricting it to 50 individuals) and the varsel (as well as the CV varsel) plot looks strange. Perhaps consider the MASS::bacteria dataset instead, e.g. with the formula y ~ week * trt + (1 | ID)?

AlejandroCatalina commented 3 years ago

That’s reasonable. We just wanted to include enough variation but I agree one example would be enough. I don’t have time right now to remake the vignette so it’ll have to stay for a while at least. If you want to give it a shot I can help with anything.

Best, Alejandro On 25. May 2021, 11.18 +0300, Frank Weber @.***>, wrote:

Even after #133, the Poisson example in the GLMM vignette stays strange: For the hierarchical SD of (1 | obs), the projection gives exactly 1 across all projected draws. When looking closer at the submodel from the first projected draw, I got this output: Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] Family: poisson ( log ) Formula: .phen_pois.1 ~ cofactor + (1 | obs) Data: data Weights: weights AIC BIC logLik deviance df.resid Inf Inf -Inf Inf 197 Random effects: Groups Name Std.Dev. obs (Intercept) 1 Number of obs: 200, groups: obs, 200 Fixed Effects: (Intercept) cofactor -1.9833 0.2467 optimizer (Nelder_Mead) convergence code: 0 (OK) ; 14000 optimizer warnings; 1 lme4 warnings The optimizer and lme4 warnings there at the end might already indicate an issue. Perhaps it's the try to model overdispersion by including (1 | obs) (where obs has a one level per observation) which causes problems here. (Note that in the currently published vignette, it seems like the hierarchical SD is not exactly 1, but that's only due to the former use of the empirical SD of the random effects instead of their hierarchical SD, see #74.) Anyway, since the vignette already takes very long to run through, perhaps simply omit this Poisson example? Or completely re-organize the whole GLMM vignette? In my opinion, a single example would be enough. And since the Bernoulli family is probably the most common one for generalized linear (mixed) models, I would choose a Bernoulli example. The VerbAgg example from the current vignette is nice, but it has a lot of observations (even when restricting it to 50 individuals) and the varsel (as well as the CV varsel) plot looks strange. Perhaps consider the MASS::bacteria dataset instead, e.g. with the formula y ~ week * trt + (1 | ID)? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

fweber144 commented 3 years ago

Of course, I understand that. If I get the time, I can create or at least draft a PR for this. But yeah, it's not too important.

fweber144 commented 2 years ago

Fixed in develop, so a corresponding label may be added here.