ggpredict shows inaccurate results for models based on lme

ndsubison2178 commented 1 year ago

There is a flaw in this ggeffects library.

1) ggeffect shows erroneous and incorrect results if you try to predict values using estimates from lme model from nlme package. However if you use lme4 instead of lme, everything works fine.

2) The x-axis labels are not accurate. the x-axis lables are marked 1,2,3....n observation number instead of actual x values.

strengejacke commented 1 year ago

Do you have a reproducible example? This one, e.g. works:

library(ggeffects)
library(nlme)

m <- lme(
  distance ~ age, data = Orthodont,
 random = ~ 1 | Sex)
ggpredict(m, "age")
#> # Predicted values of distance
#> 
#> age | Predicted |         95% CI
#> --------------------------------
#>   8 |     21.84 | [19.46, 24.21]
#>  10 |     23.16 | [20.85, 25.47]
#>  12 |     24.48 | [22.17, 26.79]
#>  14 |     25.80 | [23.42, 28.17]
#> 
#> Adjusted for:
#> * Sex = 0 (population-level)
plot(ggpredict(m, "age"))

^{Created on 2023-03-21 with reprex v2.0.2}

ndsubison2178 commented 1 year ago

@strengejacke

Dear S, I am using a dataset in a controlled environment which cannot be exported. However this dataset is very close to the issue I experienced.

This is my example,

library(lme4) data(Soybean) library(plyr) library(nlme) library(ggplot2) library(ggeffects) library(lemon)

LME_Model <- dlply(Soybean, "Year", function(x) lme(fixed = weight ~ Time, random = ~ 1 |Variety , data=x, na.action=na.exclude))

preds <- purrr::map(LME_Model , function(x) ggpredict(x, c("Time[all]", "Variety "), type="re") %>% as.data.frame()) preds <- bind_rows(preds, .id="Year")

ggplot(preds, aes(x=x, y=predicted, color=group, fill=group)) + geom_line() + facet_grid(Year~ .)

strengejacke commented 1 year ago

From your example, I can't see any error with lme. Can you elaborate a bit more on which error you experience? Furthermore, I can't see any issues with the x-labels. Please provide a reproducible example, so I can try to find any errors.

DHLocke commented 1 year ago

Sorry if this is the wrong issue to add this reprex... ggpredict does not provide confidence intervals for gamlss model (which is calling nlme::lme) but sjPlot:: does provide confidence intervals.

library(tidyverse)
library(palmerpenguins)
library(gamlss)
library(broom)
library(sjPlot)
library(ggeffects)
library(broom.mixed)

# add proportion outcome, drop NAs
penguins <- 
  penguins |> 
  mutate(prop = rBE(nrow(penguins), mu = 0.5, sigma = 0.2)) |> 
  drop_na() |> 
  droplevels()

# fit model
mod <- 
  gamlss::gamlss(prop ~ sex*body_mass_g + year + re(random = list(~1 | species, ~1 | island))
                 , family = BE()
                 , data = penguins)

# examine outputs
# broom
mod |> broom.mixed::tidy(conf.int = TRUE) # nice confidence intervals (CIs), some warnings

# sjPlot
mod |> sjPlot::plot_model() # CIs present, no warnings
mod |> sjPlot::tab_model() # CIs present, no warnings

# ggeffects
mod |> ggeffects::ggpredict() # warnings and no CIs
mod |> ggeffects::ggpredict(terms = c('sex', 'body_mass_g')) # warnings and no CIs
mod |> ggeffects::ggeffect(terms = c('sex', 'body_mass_g')) # warnings and no output
mod |> ggeffects::ggemmeans(terms = c('sex', 'body_mass_g')) # warnings and no output

# end.

strengejacke commented 1 year ago

I think the issue is that standard errors for coefficients are returned, but not for predictions. That's why plot_model() or parameters::model_parameters() include SE and CI (like summary()), but predictions probably don't. I'll look into this to confirm. You may try ggemmeans(), which relies on emmean(), not predict().

DHLocke commented 1 year ago

Thank you. The last line is calling ggpmeans.

An additional issue I have noticed is that the tab_model() seems to be using exp() on the estimates, but the link function is logit (not log). I therefore think the appropriate transformation is the inverse logit like

inv_logit <- function(p) exp(p) / (1 + exp(p))

summary(mod) #shows the mu intercept is 9.080e+00

mod |> sjPlot::tab_model() # produces 8774.88 exp(9.080e+00) # 8774.88 non-sensical given the outcome is a proportion with mu = 0.5, sigma = 0.2

but inv_logit(9.080e+00) # 0.9998861

same for the other intercept (sigma) summary(mod) #shows the sigma intercept is -1.39712 with a logit link exp(-1.39712) # 0.2473082 matching tab_model's .25 inv_logit(-1.39712) # 0.1982735, this is the original sigma given in the made up example. The model recovers the parameter correctly.

Maybe this is an error? Wrong transformation? My broader (personal) issue is trying to get anything interpretable out of a gamlss model with random effects with family = BEINF(). I love the sjPlot / ggpredict infrastructure and workflow so much. But I am struggling to extract interpretable (coefficients, means, predictions) for this complex model. Thanks for such great software, I use it all of the time.

strengejacke commented 1 year ago

I'm not sure about the plogis() transformation for the coefficient table. For ggpredict() (i.e. predictions), the link-inverse is used since we predict the response and want the predictions to be on the response-scale (probabilities).

But for coefficients, like in logistic regression, where we also have a logit-link, the coefficients are also reported as exp(coefficient), which is an odds ratio. I have never seen that coefficients (not: predictions!) are back-converted using plogis() in such cases - but just because I haven't seen it doesn't mean that it's not common practice for GAMs.

I also tried packages emmeans and marginaleffects, but no success. Pinging @vincentarelbundock for marginaleffects.

vincentarelbundock commented 1 year ago

Personally, I would just use marginaleffects::avg_comparisons(model) to get interpretable estimates of the effect size on the response scale, but YMMV.

DHLocke commented 1 year ago

Thanks. marginaleffects looks fantastic!

marginaleffects::avg_comparisons(mod, what = 'mu') Error in get_coef.gamlss(model) : Argument what indicating the parameter of interest is missing.

Given the repex, that Error is unexpected and I'm not sure how best to resolve it. What about a what = 'all' option to combine mu, sigma, nu.. what ever the case might be?

vincentarelbundock commented 1 year ago

Thanks for testing it @DHLocke . You found a bug and I just fixed it in the development version of the package.

You’ll find instructions for installation and an example at this separate issue, which I opened to discuss marginaleffects-specific issues. (We don’t want to clutter Daniel’s issue tracker.)

https://github.com/vincentarelbundock/marginaleffects/issues/933

DHLocke commented 1 year ago

Thanks so much and sorry for submitting the error to the wrong place. What about a what = 'all' argument that combines the mu and sigma?

vincentarelbundock commented 1 year ago

see the other thread for a response

strengejacke / ggeffects

ggpredict shows inaccurate results for models based on lme #297