Better simulation function

nociale commented 2 years ago

It might be useful to have a better data simulation function. In particular the following enhancements need to be discussed:

1 ICE (treat disc) + given prob of drop-out
Simulation of the probability of the intercurrent event: depends on current outcome value according to model. Need to think of input arguments to adjust this.
Simulation of post-ICE data. MAR or reference-based assumptions are probably enough

wolbersm commented 2 years ago

Yes, agree @nociale Two other arguments which set the proportion of missing data may also helpful:

Probability that all data after the ICE are missing
Probability of any randomly picked observation is missing (creating primarily intermittent missing data)

nociale commented 2 years ago

@wolbersm @gowerc I have been thinking a bit about the implementation of the function to simulate data. I would like to agree with you on the general set-up/ user interface before to implement it. Here there is a first proposal, but it probably needs improvements.

#' @title Generate data
#' 
#' @description Generate data for a two-arms clinical trial with longitudinal continuous outcome and one intercurrent event (ICE).
#'
#' @param mu_c Numeric vector indicating the mean outcome of the control arm assuming no ICE. Should include the outcome at baseline.
#' @param sigma_c Covariance matrix of the outcome from the control arm assuming no ICE.
#' @param mu_t Numeric vector indicating the mean outcome of the treatment arm assuming no ICE. Should include the outcome at baseline.
#' @param sigma_t Covariance matrix of the outcome from the treatment arm assuming no ICE.
#' @param n_c Number of subjects belonging to the control arm.
#' @param n_t Number of subjects belonging to the treatment arm.
#' @param prob_ice_c Numeric vector that specifies the probability of experiencing the ICE at each visit for a patient in the control arm with outcome equal to the control mean at baseline.
#' @param prob_ice_t Numeric vector that specifies the probability of experiencing the ICE at each visit for a patient in the treatment arm with outcome equal to the treatment mean at baseline.
#' @param or_outchg_c Numeric number that specifies the odds ratio corresponding to a worsening in the outcome in the control arm. See details.
#' @param or_outchg_t Numeric number that specifies the odds ratio corresponding to a worsening in the outcome in the treatment arm. See details.
#' @param model Optional. Right-hand side formula object that specifies the model for the probability of experiencing the ICE. See details.
#' @param model_coef_c Optional. Numeric vector that specifies the coefficients of the model for the probability of experiencing the ICE in the control arm. See details.
#' Must contain one coefficient for each variable included in the model. Needed only if `model` is specified.
#' @param model_coef_t Optional. Numeric vector that specifies the coefficients of the model for the probability of experiencing the ICE in the treatment arm. See details.
#' Must contain one coefficient for each variable included in the model. Needed only if `model` is specified.
#' @param drop_out Numeric number that specifies the drop-out rate following the ICE.
#' @param post_ice_traj Vector of characters that specifies the assumption about post-ICE trajectory.
#' Possible choices are: Missing At Random `MAR`, Jump to Reference `JR`,
#' Copy Reference `CR`, Copy Increments in Reference `CIR`, Last Mean Carried Forward `LMCF`.
#' Multiple choices are allowed.
#' @param prob_miss Numeric number that specifies the probability for a given observation to be missing. Can be used to produce
#' "intermittent" missing values (which are missing completely at random).
#'
#' @details 
#' The data generation works as follows:
#' 
#' - Generate data from a multivariate normal distribution with parameters `mu_c` and `sigma_c`
#' for the control arm and parameters `mu_t` and `sigma_t` for the treatment arm.
#' - Simulate the ICE according to the given logistic model for the probability of experiencing the ICE.
#' - Simulate drop-out after the ICE. The drop-out is conditional on the ICE and is simulated completely at random.
#' - Adjust trajectory after the ICE according to the given assumption expressed with the `post_ice_traj` argument.
#' 
#' If `model` is **not** specified, a default model for the probability of experiencing the ICE is:
#' `~ 1 + I(visit == 1) + ... + I(visit == n_visits) + I((x-alpha)/beta)` where:
#' - `n_visits` is the number of visits.
#' - `alpha = mu_c[1]` or `alpha = mu_t[1]`: `alpha` is the baseline outcome mean in the control arm if the subject belongs to the control arm. Otherwise it is the baseline outcome mean in the treatment arm.
#' - `beta = mu_c[n_visits] - mu_c[1]` or `beta = mu_t[n_visits] - mu_t[1]`: `beta` is the difference between the mean outcome at the last visit and at baseline in the control arm if the subject belongs to the control arm.
#' Otherwise it is the difference between the mean outcome at the last visit and at baseline in the treatment arm.
#' The term `I((x-alpha)/beta)` specifies the dependency of the probability of the ICE on the current outcome value.
#' The corresponding coefficient is `log(or_outchg_c)` (or `log(or_outchg_t)`) which represents the increase in the ICE probability
#' due to a worsening in the outcome from baseline equal to `beta`. `or_outchg_c` is the odds ratio corresponding to such worsening in the outcome.
#' A larger value indicates a larger probability of experiencing the ICE due to a worsening in the outcome.
#' 
#' Alternatively the model for the probability of experiencing the ICE can be provided by the user specifying `model`, `model_coef_c` and `model_coef_t`.
#' 
#' @returns A `data.frame` containing the simulated data. If multiple assumptions about post-ICE data are provided
#' a separate column containing the outcome values for each assumption will be included in the output.

General question:

Model the probability of the ICE. We have two possible implementations: (1) Fully user-specified model (using model, model_coef_c, model_coef_t arguments). Or (2) a default model with user-specified probabilities/ odds ratio (using the arguments prob_ice_c, prob_ice_t, or_outchg_c, or_outchg_t). I would like to know what in your opinion would be better from a user perspective, and/or if we should allow for both.

Thanks!

wolbersm commented 2 years ago

Hi @nociale

I like it!

Comments:

I prefer a default model with user-specified probabilities/ odds ratios. I think this is easier for the user. I don't think we need to implement both.
Can you be clearer what "the probability of experiencing the ICE at each visit" exactly means? Should this be: "The probability that an ICE occurs at or immediately after a visit. The ICE is assumed to affect only outcome values occurring after that visit."?
prob_ice_c: I think this can be a vector or a numeric which will be recycled (same prob at each visit). Also, what's the length of this? Same number as the number of visits including baseline or one less?
or_outchg_c: suggest to rename to or_outcome_ice or similar. Description: "Numeric number that specifies the odds ratio of an ICE corresponding to a +1 higher value of the outcome at the visit." In general, be careful about using the wording "worsening" as it depends on the type of outcome whether an increase is a "worsening" or an "improvement". Suggest to use the more neutral word "change" (or similar) instead.
I don't 100% understand the model model.
- Is (visit==1) baseline or the first FUP visit? I think it would be good to be explicit how visits are numbered.
- I don't think the scaling implied by beta is a good idea. I would just say that the coefficient corresponds to a +1 increase in the observed value and leave it to the user to adjust the coef to this scale.
I have a couple more cosmetic comments but will include them when the actual function is ready for review.

wolbersm commented 2 years ago

Hi @nociale

Just one further thought: For the advanced vignette, it would be very nice to have a simulated dataset with two different types of ICEs.

I thought it should be relatively easy to enhance this simulation function as follows:

Rename argument names with text ice to ice1 [i.e. current simulation function simulates ICE1].
Change drop_out to prob_post_ice1_dropout probability to drop out from study after observation of ICE1.
Add arguments prob_dropout_c and prob_dropout_t: The probability that a visit is affected by study drop-out. The simulated time of drop-out is the subject's first visit which is affected by drop-out and data from this visit and all subsequent visits is consequently set to missing. In addition, in case the subject is still on treatment at the subject's (first) visit affected drop-out, then dropout also triggers discontinuation of study drug and a corresponding ICE called "ICE2" is generated.

What do you think about this?`

Best, Marcel

nociale commented 2 years ago

@wolbersm I like this idea! It would allow to simulate trials with two ICEs without complicating the implementation. Just 3 confirmations: are the following true?

The drop-out for ICE2 is simulated independently on the outcome value.
The prob_dropout_c and prob_dropout_t are simulated missing completely at random.
prob_dropout_c and prob_dropout_t affect only pre-ICE1 visits (since we have an ad-hoc parameter prob_post_ice1_dropout for drop-out following ICE1).

Best regards, Alessandro

wolbersm commented 2 years ago

@nociale Thanks!

Yes, independently. I would call this "uninformative drop-out" or similar because it only triggers an ICE corresponding to treatment discontinuation if it occurs while subject is still on treatment. (see also 3. below)
Yes, if you mean that they are simulated according to a independent binomial with the same p at each visit.
I think they could also affect post-ICE1 visits, e.g. in case the subject had ICE1 but did not drop-out directly after ICE1 (as per prob_post_ice1_dropout) they could subsequently drop-out while off treatment due to to "additional drop-out" guided by prob_dropout_c or prob_dropout_t (simulated completely independently).

nociale commented 2 years ago

I see. Thanks for the answers, it seems everything clear to me now.

insightsengineering / rbmi

Better simulation function #225