harrelfe / Hmisc

Harrell Miscellaneous
Other
210 stars 81 forks source link

`soprobMarkovOrdm()` enforces specific label convention, is this on purpose? #186

Closed uriahf closed 2 months ago

uriahf commented 2 months ago

I'd like to use soprobMarkovOrdm() in order to produce state occupancy probabilities.

It seems that soprobMarkovOrdm() requires specific column name for previous state ('yprev') and I wonder why.

Other APIs do not require specific naming, as an example the predict() function do not have a similar requirement.

I've added some reproducible code:

library(VGAM)
#> Warning: package 'VGAM' was built under R version 4.3.3
#> Loading required package: stats4
#> Loading required package: splines
library(Hmisc)
#> Warning: package 'Hmisc' was built under R version 4.3.3
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
library(rms)
#> Warning: package 'rms' was built under R version 4.3.3
#> Warning in .recacheSubclasses(def@className, def, env): undefined subclass
#> "ndiMatrix" of class "replValueSp"; definition not updated
#> 
#> Attaching package: 'rms'
#> The following objects are masked from 'package:VGAM':
#> 
#>     calibrate, lrtest
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:Hmisc':
#> 
#>     src, summarize
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
require(data.table)
#> Loading required package: data.table
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last

getHdata(simlongord500)
d <- simlongord500 |> 
  rename("previous_state" = "yprev")

setDT(d, key='id')
d[, y     := factor(y,     levels=rev(levels(y    )))]
d[, previous_state := factor(
  previous_state, 
  levels=rev(levels(previous_state)))]
setnames(d, 'time', 'day')

f <- vglm(
  ordered(y) ~ previous_state + lsp(day, 2) + age + sofa + tx,
          cumulative(reverse=TRUE, parallel=FALSE ~ lsp(day, 2)), 
  data=d)

# `predict()` Supports Different Naming

predict(
  f, as.data.frame(d) |> 
    filter(id == 1), 
  type = "response", 
  times = c(1:28))
#>            Home In Hospital/Facility    Vent/ARDS         Dead
#> 1  7.353436e-05          0.111114772 8.584429e-01 3.036884e-02
#> 2  3.076534e-04          0.202106017 7.628981e-01 3.468818e-02
#> 3  2.917858e-04          0.195956728 7.684639e-01 3.528755e-02
#> 4  2.767364e-04          0.189949614 7.738768e-01 3.589688e-02
#> 5  2.624630e-04          0.184084139 7.791371e-01 3.651634e-02
#> 6  2.489255e-04          0.178359580 7.842454e-01 3.714608e-02
#> 7  2.360862e-04          0.172775046 7.892026e-01 3.778625e-02
#> 8  2.239090e-04          0.167329482 7.940096e-01 3.843701e-02
#> 9  1.098967e-01          0.881297681 8.735616e-03 6.999566e-05
#> 10 1.048213e-01          0.886029422 9.078022e-03 7.124924e-05
#> 11 9.995398e-02          0.890539802 9.433688e-03 7.252527e-05
#> 12 9.528861e-02          0.894834442 9.803119e-03 7.382414e-05
#> 13 9.984523e-01          0.001546085 1.593801e-06 1.163653e-08
#> 14 9.983683e-01          0.001630075 1.656827e-06 1.184495e-08
#> 15 9.982796e-01          0.001718620 1.722340e-06 1.205710e-08
#> 16 9.981862e-01          0.001811965 1.790438e-06 1.227305e-08
#> 17 9.980878e-01          0.001910370 1.861222e-06 1.249286e-08
#> 18 9.979839e-01          0.002014108 1.934800e-06 1.271662e-08
#> 19 9.978745e-01          0.002123467 2.011280e-06 1.294438e-08
#> 20 9.977591e-01          0.002238750 2.090777e-06 1.317622e-08
#> 21 9.976375e-01          0.002360276 2.173411e-06 1.341222e-08
#> 22 5.839243e-02          0.927140522 1.437889e-02 8.816348e-05
#> 23 5.554723e-02          0.929424461 1.493857e-02 8.974240e-05
#> 24 5.283288e-02          0.931556116 1.551965e-02 9.134960e-05
#> 25 5.024412e-02          0.933539969 1.612292e-02 9.298558e-05
#> 26 4.777581e-02          0.935380334 1.674921e-02 9.465085e-05
#> 27 4.542295e-02          0.937081355 1.739934e-02 9.634595e-05
#> 28 4.318072e-02          0.938646999 1.807421e-02 9.807139e-05

# `soprobMarkovOrdm()` do not Support Different Naming

soprobMarkovOrdm(
  f, as.data.frame(d)[1,], 
  times=1:28, 
  ylevels=levels(d$y),
  absorb='Dead', 
  tvarname='day') 
#> Error in soprobMarkovOrdm(f, as.data.frame(d)[1, ], times = 1:28, ylevels = levels(d$y), : yprev is not in data
couthcommander commented 2 months ago

You can certainly set pvarname="previous_state" as an argument to soprobMarkovOrdm. predict on a "vglm" object ends up calling VGAM::predict.vlm. It wasn't entirely clear to me how their function infers the state variable. If you can suggest how this works, I could make a pull request for @harrelfe to review.

uriahf commented 2 months ago

That's exactly what I needed, thanks!