Stress Test - Githubissues

rcarragh commented 3 years ago

I stress tested with data sets with a large number of AEs and in some cases the computations crawled to a halt, especially where using the Bayesian method. May want to mention this in the paper since in clinical trial settings the number of AEs can be very large especially in certain therapeutic areas or integrated data settings. As an FYI, we have implemented the package here: https://visual-analytics.shinyapps.io/index/ (see AE Line Plot tab) as part of the via the ASA Biopharm Safety WG ongoing work.

rcarragh commented 3 years ago

It is true that the Bayesian models are particularly expensive both in terms of computation and memory. The main issue is the sheer number of parameters in the model and the larger the model the more computations and memory needed. A fast CPU is also required.

One of the trials we looked at for the Berry and Berry model consisted of 23 SOCs and 497 AEs giving a total of 1115 parameters (497 (theta) + 23 (pi.theta) + 23 (mu.theta) + 23 (sigma.theta) + 4 (mu.theta.0, tau.theta.0, alpha.pi, beta.pi) + 497 (gamma) + 23 (mu.gamma) + 23 (sigma.gamma) + 2 (mu.gamma.0. mu.theta.0))

Fitting with 5 parallel chains and 60000 iterations with a 20000 burn-in period, this will require storage space for 5 x 1115 x 40000 = 2.23e+08 double precision numbers, plus other allocated memory within the package.

On a linux machine with 64GB of memory and Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz chips the model fit took me approximately 3 minutes, the convergence diagnostics took about 40 seconds, the summary statistics took approx. 4 minutes and determining the ptheta posterior values took approx. 40 seconds. Overall time was about 8 – 9 minutes.

We considered this acceptable for the type of analysis we were looking to perform – i.e. fitting the model wasn’t a time critical task for us in the sense that the results are needed very quickly.

Obviously, this changes if you use a web interface where the user is looking for a quick response.

For the interim models, I added in a “monitor” parameter which allowed the fitting functions to not store the samples for certain families of parameters. Unfortunately, I did not implement this in the Berry and Berry model.

So as suggested I’ve added a section to the paper with regard to performance so the user can gauge what they need to run the models.

A coule of ideas with regard to the future would be:

Adding the monitor parameter to all the Bayesian models
Parallelising the model fitting process using something like the R parallel package.

rcarragh commented 3 years ago

I've also added a sentence (and link) about the ASA Biopharm Safety WG ongoing work.

MelvinSMunsaka commented 3 years ago

@rcarragh I have reviewed the response and suggested future resolve to my comment and it acceptable.

rcarragh commented 3 years ago

@MelvinSMunsaka - great thanks. I'll close the issue if that's OK with you.

MelvinSMunsaka commented 3 years ago

@rcarragh That is OK with me.

rcarragh commented 3 years ago

@MelvinSMunsaka - great, thanks

rcarragh commented 3 years ago

Closing with the agreement of @MelvinSMunsaka

rcarragh / c212

Stress Test #9