stephenslab / mashr

An R package for multivariate adaptive shrinkage.
https://stephenslab.github.io/mashr
Other
88 stars 19 forks source link

speed up sampling from the posterior #95

Closed jakejh closed 3 years ago

jakejh commented 3 years ago

In my application of mashr, Bhat and Shat are matrices with ~20,000 rows and 5-10 columns. At this size, generating just 2 samples from the posterior -- using mash_compute_posterior_matrices -- takes ~10 minutes. Reducing Bhat and Shat to ~4,000 rows reduces the time to ~2 minutes.

When I try to set algorithm.version = 'Rcpp', I get an error message that the sampling method is not implemented in C++. What would it take to implement such a feature? Let me know if you need the specific matrices I'm using. Thanks a lot.

jakejh commented 3 years ago

If it helps, I have been using both the canonical and data-driven covariances, but I am considering switching to only data-driven, as the canonical covariances don't have such a clear interpretation in my case, and using only data-driven makes the mash step considerably faster.

pcarbo commented 3 years ago

@jakejh Are you using the latest version of mashr (0.2.45)?

jakejh commented 3 years ago

I'm using 0.2.38 from cran. Should I try the version on github?

pcarbo commented 3 years ago

Yes, please try the github version—we have made several major improvements.

gaow commented 3 years ago

@jakejh we made some improvements in 0.2.45 for posterior computation speed with the Rcpp version. But indeed the posterior samples feature are only available in the R version.

What would it take to implement such a feature?

Frankly it's a matter of one of us (@pcarbo or myself) working on it but not sure if we have the energy to get to it soon. May I ask what applications you have for sampling from the posterior (to make sure it is the best way to do it)?

jakejh commented 3 years ago

Ok, I'll try the github version.

In my application, the coefficients are for a spline fit. We care more about the properties of the spline curve (and their corresponding uncertainties) than about the posterior S.D.s for the individual coefficients. I talked with Matthew last week, and he suggested sampling from the posterior.

stephens999 commented 3 years ago

before going to non-r solutions, I'm puzzled about the performance. I would have expected the computation to be doing a bunch of matrix inversions etc. Once those were done I would have expected generating 10k samples to take basically the same time as 1 sample. Am I missing something?

On Mon, Apr 19, 2021 at 5:08 PM Jake Hughey @.***> wrote:

Ok, I'll try the github version.

In my application, the coefficients are for a spline fit. We care more about the properties of the spline curve (and their corresponding uncertainties) than about the posterior S.D.s for the individual coefficients. I talked with Matthew last week, and he suggested sampling from the posterior.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/mashr/issues/95#issuecomment-822818762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRROOPG4MGXR274B2363TJSSXNANCNFSM43GNUG3Q .

jakejh commented 3 years ago

Whoops, I hadn't actually tried generating different numbers of posterior samples yet. Using mashr 0.2.45 on a Bhat with 20,000 rows and 4 columns, generating 100 posterior samples and 1,000 posterior samples took about the same amount of time, 6.5 minutes. That's actually not terrible in my case, since calculating the properties of the spline fit for all 20,000 features currently takes ~30 sec per posterior sample, so almost 1 hour for 100 posterior samples.

jakejh commented 3 years ago

Just as an update, our typical use case will now involve calculating the properties of the spline fit for only a small subset (<100) of features, which makes the time to generate the posterior samples less trivial. mashr works great as is, but if you ever get to implement the posterior sampling in C++, I'd love it even more. 😁