openpharma / brms.mmrm

R package to run Bayesian MMRMs using {brms}
https://openpharma.github.io/brms.mmrm/
Other
18 stars 2 forks source link

Define a few example data sets #11

Closed kkmann closed 7 months ago

kkmann commented 1 year ago

One dataset enabling

data set containing studies with same/similar intervention and different ones/only control.

simulated + save to pkg.

wlandau commented 1 year ago

For simulated example datasets, I wonder if we could include the code as specialized functions rather than saving the datasets themselves in the package.

kkmann commented 1 year ago

We can do both (recommended by r-pkgs) would be good to have a standard dataset for all examples etc (historical data + new) and include the script to generate it see https://r-pkgs.org/data.html#sec-data-data-raw.

We should link this with #10 to make sure it covers our scope.

chstock commented 1 year ago

In the following, I list a few datasets that are all available from other R packages (on CRAN) and that may serve as exemplary datasets to fit MMRMs. (An MMRM probably isn't the ideal model everywhere but would at least have some justification.)

As discussed this week, I will further investigate what it would possibly require to obtain data from TransCelerate (i.e. control group data) for our purposes.

chstock commented 1 year ago

Orthodont - a dataset that contains measurements of an orthodontic distance (mm) at four different ages (8, 10, 12, and 14 years) for 27 children.

> library(nlme)
> help(Orthodont)
> data(Orthodont)
> head(Orthodont)
Grouped Data: distance ~ age | Subject
  distance age Subject  Sex
1     26.0   8     M01 Male
2     25.0  10     M01 Male
3     29.0  12     M01 Male
4     31.0  14     M01 Male
5     21.5   8     M02 Male
6     22.5  10     M02 Male
> dim(Orthodont)
[1] 108   4
chstock commented 1 year ago

sleepstudy - a dataset that contains longitudinal data on the reaction times of subjects in a sleep deprivation study.

> library(lme4)
> help(sleepstudy)
> data(sleepstudy)
> head(sleepstudy)
  Reaction Days Subject
1 249.5600    0     308
2 258.7047    1     308
3 250.8006    2     308
4 321.4398    3     308
5 356.8519    4     308
6 414.6901    5     308
> dim(sleepstudy)
[1] 180   3
chstock commented 1 year ago

dietox - a dataset that contains measurements of the weight of pigs at different time points (weeks) for 72 pigs.

> library(geepack)
> help(dietox)
> data(dietox)
> head(dietox)
   Pig    Evit    Cu Litter Start   Weight      Feed Time
1 4601 Evit000 Cu000      1  26.5 26.50000        NA    1
2 4601 Evit000 Cu000      1  26.5 27.59999  5.200005    2
3 4601 Evit000 Cu000      1  26.5 36.50000 17.600000    3
4 4601 Evit000 Cu000      1  26.5 40.29999 28.500000    4
5 4601 Evit000 Cu000      1  26.5 49.09998 45.200001    5
6 4601 Evit000 Cu000      1  26.5 55.39999 56.900002    6
> dim(dietox)
[1] 861   8
chstock commented 1 year ago

polyps - a dataset of an RCT of Sulindac for Polyp Prevention in Familial Adenomatous Polyposis

> library(medicaldata)
> help(polyps)
> data(polyps) # not in long format
> head(polyps)
  participant_id    sex age baseline treatment number3m number12m
1            001 female  17        7  sulindac        6        NA
2            002 female  20       77   placebo       67        63
3            003   male  16        7  sulindac        4         2
4            004 female  18        5   placebo        5        28
5            005   male  22       23  sulindac       16        17
6            006 female  13       35   placebo       31        61
> dim(polyps)
[1] 22  7
chstock commented 1 year ago

cdystonia - a dataset from a multicenter RCT of botulinum toxin type B (BotB) in patients with cervical dystonia from nine U.S. sites

Description of dataset | The dataset was also used in this book: Statistical Methods for the Analysis of Repeated Measurements

> library(Hmisc)
> getHdata(cdystonia)
> head(cdystonia)
    week site id treat age sex twstrs
1 1    0    1  1 5000U  65   F     32
1 2    2    1  1 5000U  65   F     30
1 3    4    1  1 5000U  65   F     24
1 4    8    1  1 5000U  65   F     37
1 5   12    1  1 5000U  65   F     39
1 6   16    1  1 5000U  65   F     36
> dim(cdystonia)
[1] 631   7
chstock commented 1 year ago

BtheB - a dataset from a clinical trial of an interactive multimedia program called ‘Beat the Blues'

See also this vignette | The dataset was also used in this book: Handbook of Statistical Analyses Using R (chapter: Analyzing Longitudinal Data I)

> library(HSAUR3)
> help(BtheB)
> data(BtheB) # in wide format
> BtheB$subject <- factor(rownames(BtheB))
> nobs <- nrow(BtheB)
> BtheB_long <- reshape(
+   BtheB, idvar = "subject",
+   varying = c("bdi.2m", "bdi.3m", "bdi.5m", "bdi.8m"),
+   direction = "long")
> BtheB_long$time <- rep(c(2, 3, 5, 8), rep(nobs, 4))
> subset(BtheB_long, subject %in% c("1", "2", "3"))
     drug length treatment bdi.pre subject time bdi
1.2m   No    >6m       TAU      29       1    2   2
2.2m  Yes    >6m     BtheB      32       2    2  16
3.2m  Yes    <6m       TAU      25       3    2  20
1.3m   No    >6m       TAU      29       1    3   2
2.3m  Yes    >6m     BtheB      32       2    3  24
3.3m  Yes    <6m       TAU      25       3    3  NA
1.5m   No    >6m       TAU      29       1    5  NA
2.5m  Yes    >6m     BtheB      32       2    5  17
3.5m  Yes    <6m       TAU      25       3    5  NA
1.8m   No    >6m       TAU      29       1    8  NA
2.8m  Yes    >6m     BtheB      32       2    8  20
3.8m  Yes    <6m       TAU      25       3    8  NA
> summary(BtheB_long)
  drug     length    treatment      bdi.pre         subject         time           bdi       
 No :224   <6m:196   TAU  :192   Min.   : 2.00   1      :  4   Min.   :2.00   Min.   : 0.00  
 Yes:176   >6m:204   BtheB:208   1st Qu.:15.00   10     :  4   1st Qu.:2.75   1st Qu.: 6.00  
                                 Median :22.00   100    :  4   Median :4.00   Median :12.50  
                                 Mean   :23.33   11     :  4   Mean   :4.50   Mean   :14.43  
                                 3rd Qu.:30.25   12     :  4   3rd Qu.:5.75   3rd Qu.:21.00  
                                 Max.   :49.00   13     :  4   Max.   :8.00   Max.   :53.00  
                                                 (Other):376                  NA's   :120    
> dim(BtheB_long)
[1] 400   7

The endpoint variable is bdi (there is a considerable fraction of missing data).

image

chstock commented 1 year ago

Simulation of spirometric measurements

I was looking for publicly available data from (pharmaceutical) clincal trials specifically in pulmonology, since repeated spirometric measurements (continuous variables) are often used as endpoints in this field and are typically analyzed using MMRMs. I found this paper by Santermans et al. (2019). They developed a simulation approach for the endpoint "forced vital capacity" (FVC) based on clinical trial summaries and home spirometry data, with R code given in a supplement. At first glance it looks promising to perhaps be used to quite flexibly generate realistic data for testing purposes and exemplary analyses, e.g. for vignettes. I'll try to get this running.

wlandau commented 1 year ago

What a treasure trove you found, @chstock! Thank you so much for finding these datasets! I think this is a promising start for a collection of analyses comparing brms.mmrm to mmrm in terms of fixed effect parameter estimates, covariance/correlation estimates, and associated uncertainties.

I think we may have talked about whether these analyses should be vignettes, unit tests, or part of an external test suite. I am not sure I recall what we decided.

wlandau commented 1 year ago

I think this part of the effort would be a good opportunity for someone on our team, either you or @kkmann, @yonicd, @andrew-bean, @danleibovitz, etc., to test drive the package and see what you think.

andrew-bean commented 1 year ago

These look great! One other possibility is the artificial FEV-1 dataset that is built into the mmrm package. https://openpharma.github.io/mmrm/latest-tag/reference/fev_data.html

chstock commented 1 year ago

Thanks, @wlandau! I agree, this should be something to start with. I am happy to work on this further, also on the brms.mmrm and mmrm comparison. If anyone else could start it very soon, please don't hesitate. I'd like to explore the spirometry data simulation first, because the data we may get there could be very close to phase II or III data in pharma (and there is even some connection to my work at BI).

chstock commented 1 year ago

I used the code from Santermans et al. (2019) pretty much without any modifications and obtained the desired dataset (code and output). Of note, I think the parameters are realisitc and well-justified in the paper. There is some interesting functionality w.r.t. missing data mechanisms. I still think it could be a nice basis for example data, or even for a simulation.

chstock commented 1 year ago

I received some initial feedback regarding TransCelerate/Vivli and will be able to report in our next meeting.

wlandau commented 1 year ago

Sounds great, thanks for investigating.

wlandau commented 7 months ago

Thanks everyone! It seems like we have plenty of datasets to choose from, and the clinidata. Happy to reopen if more datasets are needed for #12.