stephenslab / susieR

R package for "sum of single effects" regression.
169 stars 42 forks source link

`refine` takes long time to run for "small" datasets #222

Open gaow opened 3 months ago

gaow commented 3 months ago

I have been using susie() with refine=T for various analysis. I noticed for smaller sample size it can take very long time to run. For example even with the simulated data shown in ?susieR::susie,

     n = 1000
     p = 1000
     beta = rep(0,p)
     beta[1:4] = 1
     X = matrix(rnorm(n*p),nrow = n,ncol = p)
     X = scale(X,center = TRUE,scale = TRUE)
     y = drop(X %*% beta + rnorm(n))
     st = proc.time()
     res1 = susie(X,y,L = 10, refine=TRUE)
     proc.time() - st

It takes more than two minutes,

    user   system  elapsed 
2208.592 2871.796  130.205 

but without refine it's two seconds. @zouyuxin perhaps we should evaluate and improve the behavior of refine -- have you noticed it when you develop that feature?

pcarbo commented 3 months ago

@gaow With refine = TRUE, susie is being called an additional 16 times, so this much longer running time isn't surprising. (However, it would be helpful if the refinement step provided more updates on its progress.)

One workaround would be to set max_iter to a smaller value.

gaow commented 3 months ago

Thanks @pcarbo

One workaround would be to set max_iter to a smaller value

You mean in the "refine" codes? I think most of the time SuSiE converges in < 20 iterations anyways? It's the 16 times it is being called that seems a bit too much. In many other examples especially with larger sample size, it is much less than 16 times. I wonder if there is a way to fundamentally improve it ...

pcarbo commented 3 months ago

Yes, there is quite possibly room for improvement in the refinement step, but I don't have any clever ideas at the moment. Suggestions are welcome.