stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
172 stars 43 forks source link

Reduce memory usage #143

Closed pcarbo closed 2 years ago

pcarbo commented 2 years ago

@zouyuxin @gaow I'm giving you a chance to review the changes if you'd like. If it is okay with you, I will merge with master. This PR contains improvements in speed and memory usage for susie, susie_suff_stat and susie_rss, and some other miscellaneous improvements. Note also that the aim is to submit this new version to CRAN.

pcarbo commented 2 years ago

@gaow @zouyuxin Are the tests in test_susie_rss.R still relevant since we aren't using susie_rss_lambda? Should I skip them?

zouyuxin commented 2 years ago

They are irrelevant. The tests are skipped.

pcarbo commented 2 years ago

Also, FYI:

library(Rfast)
set.seed(1)
n   <- 8000
x   <- rnorm(n)
A   <- tcrossprod(x)
print(system.time(out1 <- isSymmetric(A)))  # 2.8 GB
print(system.time(out2 <- is.symmetric(A))) # 0.6 GB
#   user  system elapsed
#  2.203   0.687   3.053
#   user  system elapsed
#  0.331   0.002   0.344
stephens999 commented 2 years ago

I was also under impression we had found tcrossprod faster than regular matrix multiplication, but to add another datapoint, on my M1 macbook pro:

print(system.time(

  • y1 <- as.matrix(tcrossprod(M,sweep(X,2,csd,"/")) -
  • drop(tcrossprod(M,t(cm/csd)))))) user system elapsed 2.084 0.200 2.349 print(system.time(
  • y2 <- as.matrix(t(X %% (t(M)/csd)) - drop(M %% (cm/csd))))) user system elapsed 0.228 0.001 0.229

(aiui: sweep causes large memory usage so better avoided anyway)

On Mon, Nov 1, 2021 at 1:34 PM Peter Carbonetto @.***> wrote:

Also, FYI:

library(Rfast) set.seed(1)n <- 8000x <- rnorm(n)A <- tcrossprod(x) print(system.time(out1 <- isSymmetric(A))) # 2.8 GB print(system.time(out2 <- is.symmetric(A))) # 0.6 GB# user system elapsed# 2.203 0.687 3.053# user system elapsed# 0.331 0.002 0.344

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/pull/143#issuecomment-956485807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRNDJEMFKNPVH5EQMX3UJ3MTNANCNFSM5HDBZGXA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pcarbo commented 2 years ago

@gaow Just checking to see if you need more time to review this PR, or if it is acceptable enough to merge. Thanks!

gaow commented 2 years ago

@pcarbo I think this round of edits is great. But the use of Rfast might be a bit concerning? One of my team member was not able to install Rfast in her Mac though she figured it out eventually. I think @zouyuxin had the same issue at some point? It works fine on my Linux desktop though. I wonder if this would be a concern to release with Rfast requirement. I looked into the code that we use Rfast for. It would be a bit of effort to separate them from Rfast and into susieR code. I'm open to that possibility, if we get frequent complaints about installation.

pcarbo commented 2 years ago

@gaow The main constraints on using Rfast will be, I believe, (1) availability of a C++11-compatible compiler, and (2) R >= 3.5. But (1) will not be needed when using install.packages to install Rfast (which the vast majority of users will use). So it seems fine to depend on Rfast. (Also Rfast seems quite stable and widely used; >200,000 CRAN downloads.) However if you think it might be an issue we can move Rfast to "Suggests" and provide an alternative for those that don't have Rfast. If we do that, what I would suggest is that we display a message encouraging users who don't have Rfast to install it, because it does provide considerable runtime benefits.

It is also possible that some of the earlier installation issues you encountered have since been resolved by the Rfast developers. Looks like they made a lot of improvements in 2019.

zouyuxin commented 2 years ago

@gaow I didn't have issue installing Rfast.

gaow commented 2 years ago

@zouyuxin thanks. I must remembered it wrongly. @pcarbo I just asked 3 of my teammates to try this branch installation on their new Linux OS, iMac and Macbook (with persumablly new R). None had an issue. Rfast does introduce packages that we don't need that can take long to install from source code, eg RcppZiggurat. But I dont think people should complain about waiting for 15min to install an R package and do this maybe every 6 months when there is an update. I am cool with having Rfast in Imports. So let's do that?

pcarbo commented 2 years ago

Rfast only depends on 3 packages. But, yes, I see one of these packages, RcppZiggurat, as potentially being an issue because it depends on RcppGSL, and we know from experience that RcppGSL is often not easy to install. I think it should be relatively straightforward to keep Rfast in "Suggests" which will make susieR easier to install. Let me push a few changes and you can review.

pcarbo commented 2 years ago

I've added a messaget to susie_suff_stat when Rfast is not installed and R or XtX is large: For large R or XtX, consider installing the Rfast package for better performance. I have a similar message for susie.