sambrilleman / simsurv

Simulate Survival Data
GNU General Public License v3.0
22 stars 8 forks source link

Potential issue with large sample size and mixture-Weibull model #21

Closed ABullement closed 5 months ago

ABullement commented 5 months ago

Hi there,

I'm new to using simsurv, and I was hoping you might be able to help me with an issue I've encountered. I originally posted this on Stack Overflow, as I first thought it was user error, but I haven't had a reply so I thought best to raise this here.

In short, I am attempting to replicate a simulation study described in NICE DSU TSD 21 (link: https://www.sheffield.ac.uk/nice-dsu/tsds/flexible-methods-survival-analysis). The authors originally did this using Stata, whereas I'm attempting to run it in R. To do this, I'm attempting to re-produce the same results, and one of these results relies on the specification of a mixture-Weibull model.

When I run this code:

`# Load relevant packages
library(survival)
library(simsurv)

# Set sample size for the truth (i.e., a large number)
ss_truth <- 100000

# Sample frailty
z <- rnorm(ss_truth, 0, 1)

# Specify time limit
t_lim_truth <- 80

# Other parameters
dist <- "weib"
lambda1 <- 0.6
lambda2 <- 0.7
gamma1 <- 1.8
gamma2 <- 0.5
pmix <- 0.3
frailty_beta <- 0.5

# Set seed for reproducibility
set.seed(1)

# Adjust for mixture versus non-mixture models
pmixopt <- pmix
mixopt <- TRUE

# Adjust length of lambda for mixture versus non-mixture models
if(is.na(lambda2)==TRUE){
  lambda_in <- lambda1
} else {
  lambda_in <- c(lambda1, lambda2)
}

# Adjust length of gamma for mixture versus non-mixture models
if(is.na(gamma2)==TRUE){
  gamma_in <- gamma1
} else {
  gamma_in <- c(gamma1, gamma2)
}

# Generate "truth"
td <- simsurv(dist = dist, lambda = lambda_in, gamma = gamma_in,
              pmix = pmixopt, maxt = t_lim_truth, x = data.frame(z, frailty_beta), mixture = mixopt)

I obtain this error:

Error in stats::uniroot(rootfn_surv, survival = survival, x = x_i, betas = betas_i,  : 
  f() values at end points not of opposite sign

From testing some different values of gamma, and changing the sample size, I think it's due to a combination of what I have selected. However, I'm struggling to understand why this is happening, particularly as it doesn't seem to be an issue in the Stata version for survsim.

Any help would be hugely appreciated!

With best wishes, Ash

sambrilleman commented 5 months ago

Hi Ash

I can't remember exactly without reading the docs to remind myself of the details, but I think from memory this error occurs when you can't invert the survival function. I think this happens when your survival curve is either flat (i.e. stays at 1), or drops to zero within an instant (i.e. 0 from almost time zero) relative to the time range over which you are looking for a solution. I think you need to either increase or decrease the range over which you are searching for the uniroot solution. I think there is a documented argument to control that, I think it is interval from memory. Essentially your curve probably isn't behaving well over the default interval where it searches for a solution.

Hope that helps...

On Fri, 12 Apr 2024, 8:04 pm Ash Bullement, @.***> wrote:

Hi there,

I'm new to using simsurv, and I was hoping you might be able to help me with an issue I've encountered. I originally posted this on Stack Overflow, as I first thought it was user error, but I haven't had a reply so I thought best to raise this here.

In short, I am attempting to replicate a simulation study described in NICE DSU TSD 21 (link: https://www.sheffield.ac.uk/nice-dsu/tsds/flexible-methods-survival-analysis). The authors originally did this using Stata, whereas I'm attempting to run it in R. To do this, I'm attempting to re-produce the same results, and one of these results relies on the specification of a mixture-Weibull model.

When I run this code:

`# Load relevant packages library(survival) library(simsurv)

Set sample size for the truth (i.e., a large number)

ss_truth <- 100000

Sample frailty

z <- rnorm(ss_truth, 0, 1)

Specify time limit

t_lim_truth <- 80

Other parameters

dist <- "weib" lambda1 <- 0.6 lambda2 <- 0.7 gamma1 <- 1.8 gamma2 <- 0.5 pmix <- 0.3 frailty_beta <- 0.5

Set seed for reproducibility

set.seed(1)

Adjust for mixture versus non-mixture models

pmixopt <- pmix mixopt <- TRUE

Adjust length of lambda for mixture versus non-mixture models

if(is.na(lambda2)==TRUE){ lambda_in <- lambda1 } else { lambda_in <- c(lambda1, lambda2) }

Adjust length of gamma for mixture versus non-mixture models

if(is.na(gamma2)==TRUE){ gamma_in <- gamma1 } else { gamma_in <- c(gamma1, gamma2) }

Generate "truth"

td <- simsurv(dist = dist, lambda = lambda_in, gamma = gamma_in, pmix = pmixopt, maxt = t_lim_truth, x = data.frame(z, frailty_beta), mixture = mixopt)

I obtain this error:

Error in stats::uniroot(rootfn_surv, survival = survival, x = x_i, betas = betas_i, : f() values at end points not of opposite sign

From testing some different values of gamma, and changing the sample size, I think it's due to a combination of what I have selected. However, I'm struggling to understand why this is happening, particularly as it doesn't seem to be an issue in the Stata version for survsim.

Any help would be hugely appreciated!

With best wishes, Ash

— Reply to this email directly, view it on GitHub https://github.com/sambrilleman/simsurv/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVECTGNGOP74556N4BGLV3Y46WUHAVCNFSM6AAAAABGD3IDAWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTSNRVHAZDONI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ABullement commented 5 months ago

Hi Sam,

Thank you so much for this - changing the interval input seems to have done the trick. Just in case helpful for anyone else who encounters this thread, here's the line I ran which now works:

td <- simsurv(dist = dist, lambda = lambda_in, gamma = gamma_in, pmix = pmixopt, maxt = t_lim_truth, x = data.frame(z, frailty_beta), mixture = mixopt, interval = c(1E-20, 1E20))

I'll copy this over to the Stack Overflow post too.

Thanks again, it's hugely appreciated! Ash

sambrilleman commented 5 months ago

All good! Glad it worked.

On Mon, 15 Apr 2024, 5:51 am Ash Bullement, @.***> wrote:

Hi Sam,

Thank you so much for this - changing the interval input seems to have done the trick. Just in case helpful for anyone else who encounters this thread, here's the line I ran which now works:

td <- simsurv(dist = dist, lambda = lambda_in, gamma = gamma_in, pmix = pmixopt, maxt = t_lim_truth, x = data.frame(z, frailty_beta), mixture = mixopt, interval = c(1E-20, 1E20))

I'll copy this over to the Stack Overflow post too.

Thanks again, it's hugely appreciated! Ash

— Reply to this email directly, view it on GitHub https://github.com/sambrilleman/simsurv/issues/21#issuecomment-2054167087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVECTDMJ7FKX72TVW5VSYDY5LM3VAVCNFSM6AAAAABGD3IDAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJUGE3DOMBYG4 . You are receiving this because you commented.Message ID: @.***>