Very long running time for survival on a static regime

osofr commented 10 years ago

ltmle() takes too much time to run for end of follow-up time-point survival on a static regime (no SuperLearner). Full data run takes about 80min and 20GB of RAM. It appears most of the time is spent in ConvertCensoringNodesToBinary(), CleanData() and XMatch() functions and the running time is approximately linear in N (5K subsample takes 8min). My goal is to eventually run MSMs on survival at each of 17 time points with SuperLearning, which would take too long with current run times. Coding censoring variables as factors (per documentation) or as binaries has no effect on performance.

Any advice on what could be causing this slow down and how to fix it? Unfortunately, I can't share the data, but would be happy to run any tests / give more details. See below detailed description of the problem and some ltmle.R profiling results.

Thanks, Oleg

Data:

50K observations (N), 17 time points (t), 60 baseline covariates (W), 35 time-dep covariates (L_t), time-dep treatment (A_t), 3 types of censoring (Ct_1,Ct_2,Ct_3), survival outcome (Y_t)

Modeling:

Models Q_t and g_t are specified for each time point (one Q_t model for each LY block), both depend only on baseline and previous time-point covariates

Running ltmle:

Running tlmle() function with a static regime, abar=(1,...,1) and no stratification, setstrat=FALSE.

ltmle_out <- ltmle(data=DataWide_subsamp, Anodes=Anodes, Cnodes=Cnodes, Lnodes=Lnodes_1_tmax, Ynodes=Ynodes, abar=rep(1, 17), estimate.time=FALSE, survivalOutcome=TRUE, stratify=setstrat, iptw.only=FALSE, gform=gform_vec, Qform=Qform_vec)

Profiling ltmle.R by line on a subsample of 5K observations:

(8 min run time)

Rprof("ltmle_run_memuse_line.out", line.profiling=TRUE)
source("./ltmle/R/ltmle.R")
ltmle_out <- ltmle(....)
Rprof(NULL)
summaryRprof("ltmle_run_memuse_line.out", lines = "show")

summaryRprof("ltmle_run_memuse_line.out", lines = "show") $by.self

line	self.time	self.pct	total.time	total.pct
ltmle.R#1525	123.70	26.60	130.76	28.12
ltmle.R#1212	121.32	26.09	121.32	26.09
ltmle.R#1206	83.28	17.91	84.40	18.15
ltmle.R#997	46.66	10.03	46.66	10.03
ltmle.R#893	39.58	8.51	39.58	8.51
ltmle.R#1026	15.74	3.38	15.74	3.38

# Noam Ross proftable function
source("proftable.R")
proftable("ltmle_run_memuse_line.out", lines=40)

ConvertCensoringNodesToBinary > 1#1525 > [<- > [<-.data.frame CleanData > 1#1212 > [ > [.data.frame CleanData > 1#1212 > [ > [.data.frame CleanData > 1#1222 > is.na.strict > 1#1206 > [ > [.data.frame EstimateG > 1#797 > Estimate > 1#851 > SuppressGivenWarnings > 3#20 > withCallingHandlers > 1#852 > > 1#893 > glm > eval > glm.fit XMatch > 1#997 > apply IsDeterministic > 1#925 > XMatch > 1#997 > apply Estimate > 1#845 > ConvertCensoringNodesToBinary > 1#1525 EstimateG > 1#771 > SetA > 1#1026 > [<- > [<-.data.frame > split > split.default > as.factor > factor > unique > unique.matrix > apply > FUN > paste CleanData > 1#1212 > [ > [.data.frame > [.factor > NextMethod CleanData > 1#1212 > is.na > is.na.data.frame > do.call > cbind

joshuaschwab commented 10 years ago

Hi Oleg,

Sorry it's taking a long time to run. I'm not sure how much I can help right now. It seems like even if we speed up ltmle (without SuperLearner), once you start calling SuperLearner, I would guess that SuperLearner is going to take most of the time. You're going to call SuperLearner at least 5 times per time point (3 C nodes, 1 A node, 1 or more LY nodes), so I would think that 85+ calls to SuperLearner with n=50k and 95 columns is going to be much slower than all the rest of the ltmle code.

Nonetheless, if you want to try to speed up ltmle, you can just take out CleanData (assuming your data already conforms - it does if you're not seeing the "Note: for internal purposes, all nodes after a censoring event..." message). CleanData takes about 16 mins on my (old) computer. It looks to me like ConvertCensoringNodesToBinary takes less than one second, I don't know why the profiler is saying it's taking a lot of time. XMatch gets called a lot, so I'm not sure there's an easy fix there.

Here's the code I used to try to replicate the speed issues:

set.seed(1) max.time <- 17 n <- 50000 num.w <- 60 num.l <- 35 W <- matrix(rnorm(n * num.w), nrow=n) data <- data.frame(W=W) Y <- rep(0, n) cens <- rep(F, n) died <- rep(F, n) for (t in 1:max.time) { L <- matrix(rnorm(n * num.l), nrow=n) L[Y==1 | cens, ] <- NA C <- matrix(rbinom(n * 3, size=1, prob=0.99), nrow=n) C[Y==1 | cens, ] <- NA for (i in 1:3) { C[cens, i] <- NA cens <- cens | (!C[, i] & !is.na(C[, i])) } A <- rbinom(n, size=1, prob=0.5) C[Y==1] <- NA A[Y==1 | cens] <- NA Y <- as.numeric(rbinom(n, size=1, prob=0.05) | died) #problem when Y is already NA - probably similar problem with C, maybe use ua died <- Y==1 Y[cens] <- NA data <- data.frame(data, data.frame(L=L, C=C, A=A, Y=Y)) }

Anodes <- grep("^A", names(data)) Cnodes <- grep("^C", names(data)) Lnodes <- grep("^L", names(data)) Ynodes <- grep("^Y", names(data)) nodes <- ltmle:::CreateNodes(data, Anodes, Cnodes, Lnodes, Ynodes)

data <- ltmle:::ConvertCensoringNodes(data, Cnodes, has.deterministic.functions=F) print(system.time(temp <- ltmle:::ConvertCensoringNodesToBinary(data, Cnodes))) user system elapsed 0.696 0.080 0.775 print(system.time(temp <- ltmle:::CleanData(data, nodes, deterministic.Q.function=NULL, survivalOutcome=T, showMessage=T))) user system elapsed 829.209 134.643 963.671

Josh

From: osofr notifications@github.com To: joshuaschwab/ltmle ltmle@noreply.github.com Sent: Thursday, May 1, 2014 1:01 PM Subject: [ltmle] Very long running time for survival on a static regime (#9)

ltmle() takes too much time to run for end of follow-up time-point survival on a static regime (no SuperLearner). Full data run takes about 80min and 20GB of RAM. It appears most of the time is spent in ConvertCensoringNodesToBinary(), CleanData() and XMatch() functions and the running time is approximately linear in N (5K subsample takes 8min). My goal is to eventually run MSMs on survival at each of 17 time points with SuperLearning, which would take too long with current run times. Coding censoring variables as factors (per documentation) or as binaries has no effect on performance. Any advice on what could be causing this slow down and how to fix it? Unfortunately, I can't share the data, but would be happy to run any tests / give more details. See below detailed description of the problem and some ltmle.R profiling results. Thanks, Oleg Data: 50K observations (N), 17 time points (t), 60 baseline covariates (W), 35 time-dep covariates (L_t), time-dep treatment (A_t), 3 types of censoring (Ct_1,Ct_2,Ct_3), survival outcome (Y_t) Modeling: Models Q_t and g_t are specified for each time point (one Q_t model for each LY block), both depend only on baseline and previous time-point covariates Running ltmle: Running tlmle() function with a static regime, abar=(1,...,1) and no stratification, setstrat=FALSE. ltmle_out <- ltmle(data=DataWide_subsamp, Anodes=Anodes, Cnodes=Cnodes, Lnodes=Lnodes_1_tmax, Ynodes=Ynodes, abar=rep(1, 17), estimate.time=FALSE, survivalOutcome=TRUE, stratify=setstrat, iptw.only=FALSE, gform=gform_vec, Qform=Qform_vec) Profiling ltmle.R by line on a subsample of 5K observations: (8 min run time) Rprof("ltmle_run_memuse_line.out", line.profiling=TRUE) source("./ltmle/R/ltmle.R") ltmle_out <- ltmle(....) Rprof(NULL) summaryRprof("ltmle_run_memuse_line.out", lines = "show") summaryRprof("ltmle_run_memuse_line.out", lines = "show")

$by.self line self.time self.pct total.time total.pct ltmle.R#1525 123.70 26.60 130.76 28.12 ltmle.R#1212 121.32 26.09 121.32 26.09 ltmle.R#1206 83.28 17.91 84.40 18.15 ltmle.R#997 46.66 10.03 46.66 10.03 ltmle.R#893 39.58 8.51 39.58 8.51 ltmle.R#1026 15.74 3.38 15.74 3.38

Noam Ross proftable function

source("proftable.R") proftable("ltmle_run_memuse_line.out", lines=40) ConvertCensoringNodesToBinary > 1#1525 > [<- > [<-.data.frame CleanData > 1#1212 > [ > [.data.frame CleanData > 1#1212 > [ > [.data.frame CleanData > 1#1222 > is.na.strict > 1#1206 > [ > [.data.frame EstimateG > 1#797 > Estimate > 1#851 > SuppressGivenWarnings > 3#20 > withCallingHandlers > 1#852 > > 1#893 > glm > eval > glm.fit XMatch > 1#997 > apply IsDeterministic > 1#925 > XMatch > 1#997 > apply Estimate > 1#845 > ConvertCensoringNodesToBinary > 1#1525 EstimateG > 1#771 > SetA > 1#1026 > [<- > [<-.data.frame > split > split.default > as.factor > factor > unique > unique.matrix > apply > FUN > paste CleanData > 1#1212 > [ > [.data.frame > [.factor > NextMethod CleanData > 1#1212 > is.na > is.na.data.frame > do.call > cbind — Reply to this email directly or view it on GitHub.

osofr commented 10 years ago

Hi Josh,

Thanks for a thoughtful reply. ConvertCensoringNodesToBinary is definitely a big bottleneck on my dataset, so its clearly something specific to the data I am working with. I will try to simulate this scenario to see if I can replicate the profiler results from the actual data.

A somewhat unrelated note. Memoise function is applied to a glm object, which stores tons of unnecessary information (including the entire dataset). The only thing that is needed, if I understand it correctly, is the result of predict.glm for a given design matrix, which is just a vector. Isn't it possible to wrap glm and predict into one function that returns prediction vector and memoise that instead?

Also, a question. How easy is it to parallelize the SuperLearner? Same thing for the ltmle package, for example in MSM estimation, how easy do you think it would be to parallelize estimation for each survival time point? I have access to a server with a lot of cores so parallelizing could give a big boost in performance.

Thanks, Oleg

joshuaschwab commented 10 years ago

Hi Oleg,

Memoise doesn't apply to ltmle, only to ltmleMSM. But if you're using ltmleMSM, I agree that the memoise section is not well written - it's just a temporary hack. I'm planning on removing memoise entirely in a future release - it shouldn't be needed if I rewrite a few other functions to reuse g.

SuperLearner has some parallelized versions - mcSuperLearner and snowSuperLearner - see ?SuperLearner. I haven't used them, but it looks like you could make a minor change to ltmle:::Estimate to have them called. Parellelizing ltmleMSM would take a little work, but is doable. You'd want to parrellize the final.Ynodes loop in MainCalcs (if using the pooled MSM) or NonpooledMSM (if not). But I would guess that if you get all of the available cores working on SuperLearner, that's going to be 90% of the speed benefit.

I haven't used the ltmle package on datasets as large as yours, so I'm glad you're trying it out and identifying things to improve.

thanks, Josh

From: osofr notifications@github.com To: joshuaschwab/ltmle ltmle@noreply.github.com Cc: joshuaschwab joshuaschwab@yahoo.com Sent: Friday, May 2, 2014 12:44 PM Subject: Re: [ltmle] Very long running time for survival on a static regime (#9)

Hi Josh, Thanks for a thoughtful reply. ConvertCensoringNodesToBinary is definitely a big bottleneck on my dataset, so its clearly something specific to the data I am working with. I will try to simulate this scenario to see if I can replicate the profiler results from the actual data. A somewhat unrelated note. Memoise function is applied to a glm object, which stores tons of unnecessary information (including the entire dataset). The only thing that is needed, if I understand it correctly, is the result of predict.glm for a given design matrix, which is just a vector. Isn't it possible to wrap glm and predict into one function that returns prediction vector and memoise that instead? Also, a question. How easy is it to parallelize the SuperLearner? Same thing for the ltmle package, for example in MSM estimation, how easy do you think it would be to parallelize estimation for each survival time point? I have access to a server with a lot of cores so parallelizing could give a big boost in performance. Thanks, Oleg — Reply to this email directly or view it on GitHub.

joshuaschwab / ltmle