mfasiolo / qgam

Additive quantile regression R package
http://mfasiolo.github.io/qgam/
30 stars 7 forks source link

number of items to replace is not a multiple of replacement length -> mObj issue? #40

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hi,

When trying to fit the below model, I get the following error message:

data <- readRDS("data.rds")
qgam(f1 ~ region + s(time,k=5,by=region) + s(speaker,bs="re") + s(time,speaker,bs="fs",k=5,m=1),data=data,qu=.5)
Error in X[, object$smooth[[k]]$first.para:object$smooth[[k]]$last.para] <- Xfrag : 
  number of items to replace is not a multiple of replacement length
In addition: Warning messages:
1: In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
  model has repeated 1-d smooths of same variable.
2: In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
  model has repeated 1-d smooths of same variable.
3: In gam.side(sm, X, tol = .Machine$double.eps^0.5) :
  model has repeated 1-d smooths of same variable.
4: contrasts dropped from factor region 

The first three of those warnings are correct and no problem, the fourth one is suspicious because the Gaussian fit does not have any missing coefficients for region, as far as I can see. The actual error happens in predict.gam, which qgam calls from here: https://github.com/mfasiolo/qgam/blob/7f5fc9249be7ef71ed6412766b4e2cc17464c37f/R/tuneLearnFast.R#L251 The error occurs when predict.gam is working on the factor smooth. At some point it is doing this:

Xfrag <- PredictMat(object$smooth[[k]], data)
X[, object$smooth[[k]]$first.para:object$smooth[[k]]$last.para] <- Xfrag

where, for some reason, Xfrag has the expected 160 columns (there are 160 speakers in my dataset), but object$smooth[[k]]$first.para:object$smooth[[k]]$last.para has only 159 elements. But running predict.gam manually on the gausFit object works without error, so this does not seem to be a bug in mgcv itself, but rather an an interaction with the fake model structure in the mObj object set up by qgam.

Data can be downloaded here: https://surfdrive.surf.nl/files/index.php/s/KXMxHGeSlNmXD3a

ghost commented 3 years ago

Mystery solved: one speaker turns out to have all NA on the dependent variable. This should have been handled by .cleanData, and I think this is the reason why it wasn't: https://github.com/mfasiolo/qgam/blob/7f5fc9249be7ef71ed6412766b4e2cc17464c37f/R/I_cleanData.R#L53 I believe that .dat should be .datO here. Making that change through fixInNamespace fixes the problem I'm seeing.

mfasiolo commented 3 years ago

Hi,

Sorry for the late reply and thanks for finding the bug! Should have been solved by the latest commit c1638f43233acab127fd908bcdbaeb0a197b30dd