vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
462 stars 48 forks source link

Gls(), gls() and rcs() support #952

Closed lang-benjamin closed 12 months ago

lang-benjamin commented 12 months ago

Consider the following longitudinal data set from here. I wanted to reproduce this analysis and do some calculations with the marginaleffects package.

  1. I realized that there is no support for rms::Gls(). It would be nice to have this support because it would complement the already supported functions ols(), lrm(), orm(). (btw, cph()would be nice too :)).
  2. I tried using nlme::gls() instead and use marginaleffects::predictions() on it:
    
    library(Hmisc)
    #> 
    #> Attache Paket: 'Hmisc'
    #> Die folgenden Objekte sind maskiert von 'package:base':
    #> 
    #>     format.pval, units
    library(data.table)
    library(rms)
    library(nlme)
    library(marginaleffects)

getHdata(cdystonia) setDT(cdystonia) # convert to data.table cdystonia[, uid := paste(site, id)] # unique subject ID

baseline <- cdystonia[week == 0] baseline[, week := NULL] setnames(baseline, 'twstrs', 'twstrs0') followup <- cdystonia[week > 0, .(uid, week, twstrs)] setkey(baseline, uid) setkey(followup, uid, week) both <- Merge(baseline, followup, id = ~ uid)

> Vars Obs Unique IDs IDs in #1 IDs not in #1

> baseline 7 109 109 NA NA

> followup 3 522 108 108 0

> Merged 9 523 109 109 0

>

> Number of unique IDs in any data frame : 109

> Number of unique IDs in all data frames: 108

Remove person with no follow-up record

both <- both[! is.na(week)] dd <- datadist(both) options(datadist='dd')

a <- gls(twstrs ~ treat rcs(week, 3) + rcs(twstrs0, 3) + rcs(age, 4) sex, data = both, correlation = corCAR1(form=~week | uid))

> Warning in rcspline.eval(x, nk = nknots, inclx = TRUE, pc = pc, fractied =

> fractied): 3 knots requested with 5 unique values of x. knots set to 3

> interior values.

predictions(a) # works

>

> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %

> 24.2 2.25 10.7 <0.001 87.0 19.8 28.6

> 24.9 2.13 11.7 <0.001 103.0 20.8 29.1

> 26.7 2.22 12.0 <0.001 108.5 22.3 31.0

> 29.6 2.10 14.1 <0.001 146.9 25.5 33.7

> 32.8 2.26 14.5 <0.001 156.3 28.4 37.2

> --- 512 rows omitted. See ?avg_predictions and ?print.marginaleffects ---

> 61.6 2.77 22.2 <0.001 360.9 56.2 67.0

> 64.8 2.89 22.4 <0.001 366.8 59.1 70.5

> 43.4 2.36 18.4 <0.001 249.9 38.8 48.1

> 45.3 2.43 18.6 <0.001 255.1 40.6 50.1

> 55.7 2.45 22.7 <0.001 377.2 50.9 60.5

> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, twstrs, treat, week, twstrs0, age, sex, uid

> Type: response

predictions(a, newdata = datagrid(age = 46, treat = "10000U", week = 8, twstrs0 = 46, sex = "F")) # does not work

> Error: Unable to compute predicted values with this model. You can try to

> supply a different dataset to the newdata argument. This error was

> also raised:

>

> knots not specified, and < 6 non-missing observations

>

> Bug Tracker:

> https://github.com/vincentarelbundock/marginaleffects/issues

predictions(a, newdata = "median") # does not work

> Error: Unable to compute predicted values with this model. You can try to

> supply a different dataset to the newdata argument. This error was

> also raised:

>

> knots not specified, and < 6 non-missing observations

>

> Bug Tracker:

> https://github.com/vincentarelbundock/marginaleffects/issues


<sup>Created on 2023-11-03 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

It seems that the issue is caused by using `rcs()`. I tried without restricted cubic splines and then it worked. Is it possible to add support for `rcs()`?
3. By doing so, I also realized something else: it seems that the median is used if nothing is stated for some numeric variable. For example:
``` r
# Remove restriced cubic splines and model linearly
b <- gls(twstrs ~ treat * week + twstrs0 + age * sex, 
         data = both, 
         correlation = corCAR1(form=~week | uid))

predictions(b, newdata = datagrid(age = 46))
#> 
#>  age Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %  treat week twstrs0
#>   46     39.2       1.22 32.1   <0.001 750.1  36.8   41.6 10000U    8      46
#>  sex uid
#>    F 1 1
#> 
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, twstrs, treat, week, twstrs0, sex, uid, age 
#> Type:  response
predictions(b, newdata = "mean")
#> 
#>  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %  treat week twstrs0 age
#>      39.3       1.08 36.4   <0.001 963.8  37.1   41.4 10000U    8      46  56
#>  sex uid
#>    F 1 1
#> 
#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, twstrs, treat, week, twstrs0, age, sex, uid 
#> Type:  response

For twstrs0 for example the mean is 45.65, but 46 is shown here in the output. Am I missing something here?

vincentarelbundock commented 12 months ago

Can you try with the latest dev versions of marginaleffects and insight?

remotes::install_github("vincentarelbundock/marginaleffects")
remotes::install_github("easystats/insight")

Restart R completely to get this:

packageVersion("marginaleffects")
# [1] '0.16.0.9005'
packageVersion("insight")
# [1] '0.19.6.6'
library(Hmisc)
library(data.table)
library(rms)
library(nlme)
library(marginaleffects)

getHdata(cdystonia)
setDT(cdystonia)          # convert to data.table
cdystonia[, uid := paste(site, id)]   # unique subject ID

baseline <- cdystonia[week == 0]
baseline[, week := NULL]
setnames(baseline, "twstrs", "twstrs0")
followup <- cdystonia[week > 0, .(uid, week, twstrs)]
setkey(baseline, uid)
setkey(followup, uid, week)
both <- Merge(baseline, followup, id = ~uid)
both <- both[!is.na(week)]
dd <- datadist(both)
options(datadist = "dd")

a <- gls(twstrs ~ treat * rcs(week, 3) + rcs(twstrs0, 3) + rcs(age, 4) * sex, 
         data = both, 
         correlation = corCAR1(form=~week | uid))

predictions(a) # works
# 
#  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#      24.2       2.25 10.7   <0.001  87.0  19.8   28.6
#      24.9       2.13 11.7   <0.001 103.0  20.8   29.1
#      26.7       2.22 12.0   <0.001 108.5  22.3   31.0
#      29.6       2.10 14.1   <0.001 146.9  25.5   33.7
#      32.8       2.26 14.5   <0.001 156.3  28.4   37.2
# --- 512 rows omitted. See ?avg_predictions and ?print.marginaleffects --- 
#      61.6       2.77 22.2   <0.001 360.9  56.2   67.0
#      64.8       2.89 22.4   <0.001 366.8  59.1   70.5
#      43.4       2.36 18.4   <0.001 249.9  38.8   48.1
#      45.3       2.43 18.6   <0.001 255.1  40.6   50.1
#      55.7       2.45 22.7   <0.001 377.2  50.9   60.5
# Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, twstrs, treat, week, twstrs0, age, sex, uid 
# Type:  response

predictions(a, newdata = datagrid(age = 46, treat = "10000U", week = 8, twstrs0 = 46, sex = "F")) # does not work
# 
#  age  treat week twstrs0 sex Estimate Std. Error    z Pr(>|z|)     S 2.5 %
#   46 10000U    8      46   F     35.6       1.84 19.4   <0.001 274.9    32
#  97.5 % uid
#    39.2 1 1
# 
# Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, twstrs, uid, age, treat, week, twstrs0, sex 
# Type:  response

predictions(a, newdata = "median") # does not work
# 
#  Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %  treat week twstrs0 age
#      37.3       1.71 21.8   <0.001 347.6    34   40.7 10000U    8      46  56
#  sex uid
#    F 1 1
# 
# Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, twstrs, treat, week, twstrs0, age, sex, uid 
# Type:  response
vincentarelbundock commented 12 months ago

I just added support for rms::Gls in the dev version on Github