mfasiolo / mgcViz

An R package for interactive visualization of GAM models
https://mfasiolo.github.io/mgcViz/
74 stars 9 forks source link

Handling of weights in simulate -> warning? #74

Open florianhartig opened 3 years ago

florianhartig commented 3 years ago

Hi Matteo,

I am considering switching to mgcViz:::simulate for simulating from gam objects in DHARMa, see https://github.com/florianhartig/DHARMa/issues/309.

One suggestion: when fitting models with weights for other than binomial and gaussian families, I assume that weights are simply applied to the likelihood when fitted, but ignored in the simulations. I think it would be better to throw a warning then (currently, no warning is returned).

Cheers, F

mfasiolo commented 3 years ago

Hi Florian,

I was looking at this and the weights are not ignored, but passed to family$rd() or family$qf()... For instance:

> gaulss()$rd
function (mu, wt, scale) 
{
    return(rnorm(nrow(mu), mu[, 1], sqrt(scale/wt)/mu[, 2]))
}

uses the weights.

Matteo

florianhartig commented 3 years ago

Yes, for gaussian / binomial, weights have a particular meaning in the likelihood / data-generating model, but for Poisson, the weights are just weights on the likelihood and have no correspondence to any data-generating model (effectively, this is a pseudo-likelihood). In this case, simulated data will not always look like observed data (because the weights cause the fit to disregard particular data points).

So, Effectively, weights in regression packages in R are used in 3 different ways:

  1. control expected dispersion in the likelihood (as in the Gaussian) -> can be simulated from, no problem
  2. weight on the likelihood (e.g. Poisson) -> can't be simulated from, simulations won't fit to the data -> simulate() should throw a warning
  3. the binomial n -> no problem

In retrospect, I think it was a mistake from the R programmers to overload the weight argument in glm with these different meanings, it would have been much better to have separate variable names for all three options.

Anyway, what I would suggest is to throw a warning for all families that are using weights on the likelihood only, without a data-generating model. This is for sure so for the Poisson, not sure about all the other extended families.