stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
176 stars 45 forks source link

Sample from SuSiE approximate posterior #66

Closed gaow closed 3 years ago

gaow commented 5 years ago

As discussed offline earlier, we'd like to sample from the posterior distributions for gamma, eg, susie_sample_posterior where we input SuSiE fit and output posterior samples. Optionally it may output "configuration probabilities" (those obtained by other BVSR methods) recovered from the sample. Documenting it here as a new feature request that should be easy to fulfill.

eg

gamma <- zeros(1,p)
for (l in 1:L) {
  i <- rmultinom(alpha[[l]],1)
  gamma[i] <- 1
}
pcarbo commented 5 years ago

@gaow You might as well make this more general and draw ns posterior samples of all model parameters (beta, gamma, etc).

gaow commented 5 years ago

@pcarbo thanks for the tip -- yes that'd be very helpful to have. But what do you mean by ns posterior samples?

pcarbo commented 5 years ago

ns >= 1 is the number of samples to draw from the posterior.

stephens999 commented 5 years ago

Not sure how useful that is. Weren't we trying to get away from the configurations?

On Fri, Oct 26, 2018, 16:05 gaow notifications@github.com wrote:

As discussed offline earlier, we'd like to sample from the posterior distributions for gamma, eg, susie_sample_posterior where we input SuSiE fit and output posterior samples. Optionally it may output "configuration probabilities" (those obtained by other BVSR methods) recovered from the sample. Documenting it here as a new feature request that should be easy to fulfill.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/66, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt4xe2TkjZPAF0RokDMAN7uQmwYUZZ-ks5uo2skgaJpZM4X85C6 .

gaow commented 5 years ago

@stephens999 indeed. But this is a question raised by Xin whose collaborator has a pipeline to run enloc which does colocalization in light of enrichment analysis. At leas the software (pipeline) needs configurations as input, if we want to play along with existing stuff.

KangchengHou commented 4 years ago

Hi,

Thanks for the great software! I am interested in implementing this feature. I attach a draft implementation based on the following:

image

There is one thing I am uncertain of this draft implementation. I notice that in susie_get_posterior_mean, susie_get_posterior_sd, there is a division by X_column_scale_factors. I imagine we need that here too. But it would be nice if you could help check.

Also, there could be some performance speedup, I'll try some implementation tricks and find quicker implementation.

I think this should be put into susieR/susie_utils.R? Let me know what you think and i'll send a pull request later.

Best, Kangcheng

# library
library(susieR)
library(RcppCNPy)

# simulation from https://stephenslab.github.io/susie-paper/manuscript_results/motivating_example.html
set.seed(1)
n = 500
p = 1000
b = rep(0,p)
b[200] = 1
b[800] = 1
X = matrix(rnorm(n*p),nrow=n,ncol=p)
X[,200] = X[,400]
X[,600] = X[,800]
y = X %*% b + rnorm(n)

# fit susie
susie_fit = susieR::susie(X, y, L=5,
               estimate_residual_variance=TRUE, 
               scaled_prior_variance=0.2,
               tol=1e-3, track_fit=TRUE, min_abs_corr=0.1)

# sampling code
susie_get_posterior_sample <- function(susie_fit, num_samples, seed=1234){
    set.seed(seed)
    # outputs a (num_snps, num_samples) matrix, each column is a sample from posterior distribution
    posterior_mean <- susie_fit$mu / susie_fit$X_column_scale_factors
    posterior_sd <- sqrt(susie_fit$mu2 - (susie_fit$mu)^2) / susie_fit$X_column_scale_factors
    pip <- susie_fit$alpha
    L = nrow(pip)
    num_snps <- ncol(pip)
    b_samples <- matrix(NA, num_snps, num_samples)
    for (sample_i in 1 : num_samples){
        b <- 0
        for (l in 1 : L){
            gamma <- rmultinom(1, 1, pip[l, ])
            effect_size <- rnorm(1, mean=posterior_mean[l, which(gamma != 0)], sd=posterior_sd[l, gamma])
            b_l <- gamma * effect_size
            b <- b + b_l
        }
        b_samples[, sample_i] <- b
    }
    return(b_samples)
}

# get posterior samples
num_samples <- 2000
posterior_samples <- susie_get_posterior_sample(susie_fit, num_samples)

# plot effect sizes
pdf('figures/susie_fit.pdf', width=5, height =5, pointsize=16)
plot(coef(susie_fit), pch=16, ylab='SuSiE effect sizes')
dev.off()

# plot mean of SuSiE samples
pdf('figures/posterior_means.pdf', width =5, height =5, pointsize=16)
plot(rowMeans(posterior_samples), pch=16, ylab='Mean effect sizes of posterior samples')
dev.off()
gaow commented 4 years ago

Thanks @KangchengHou for taking the initiative on it!

First thing first, would you mind telling us your motivation for having this feature in susieR? You see the discussion on this thread was paused because of lack of motivation. It would be helpful if you could give us an example why the feature is desirable.

Regarding the implementation:

  1. Here you output samples for beta but could you also output samples for gamma the "configurations"? Even though it is easy to obtain that from beta samples, it seems conceptually clear to provide gamma samples instead.
  2. I see you already took into consideration the scaling factors for the effects so we should be good there.
  3. Should we use seed=NULL as default and only set.seed if it is provided?
  4. If you look into susie_get_pip you see we removed those effects having estimated prior variance equals zero. I think we should do the same here excluding those effects, to be consistent.
  5. Typically we need some sort of unit test for features implemented. Would you mind computing for example the PIP from gamma samples and see how similar they are to the result of susie_get_pip ?

It is perhaps better to use a separate R script file call it sample_posterior.R instead of putting it in susie_utils.R.

stephens999 commented 4 years ago

i suggest entirely remove seed as parameter - the user can control the seed as they like outside of the function. (and this way the function has no side effects, which is good practice i think.)

On Tue, Mar 3, 2020 at 2:30 PM gaow notifications@github.com wrote:

Thanks @KangchengHou https://github.com/KangchengHou for taking the initiative on it!

First thing first, would you mind telling us your motivation for having this feature in susieR? You see the discussion on this thread was paused because of lack of motivation. It would be helpful if you could give us an example why the feature is desirable.

Regarding the implementation:

  1. Here you output samples for beta but could you also output samples for gamma the "configurations"? Even though it is easy to obtain that from beta samples, it seems conceptually clear to provide gamma samples instead.
  2. I see you already took into consideration the scaling factors for the effects so we should be good there.
  3. Should we use seed=NULL as default and only set.seed if it is provided?
  4. If you look into susie_get_pip you see we removed those effects having estimated prior variance equals zero. I think we should do the same here excluding those effects, to be consistent.
  5. Typically we need some sort of unit test for features implemented. Would you mind computing for example the PIP from gamma samples and see how similar they are to the result of susie_get_pip ?

It is perhaps better to use a separate R script file call it sample_posterior.R instead of putting it in susie_utils.R.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/66?email_source=notifications&email_token=AANXRRJYLAXIFPEXNF7YA53RFVSGFA5CNFSM4F7TSC5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENVA4WA#issuecomment-594153048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRJF2XHVSCZ43RBBTFTRFVSGFANCNFSM4F7TSC5A .

gaow commented 4 years ago

@stephens999 I completely agree for reasons you stated above, until I recall we had this in mashr ... shall we remove that as well?

stephens999 commented 4 years ago

maybe... i don't really see the need for it! matthew

On Tue, Mar 3, 2020 at 3:02 PM gaow notifications@github.com wrote:

@stephens999 https://github.com/stephens999 I completely agree for reasons you stated above, until I recall we had this in mashr https://github.com/stephenslab/mashr/blob/abe577386d93391413fe94772095431971466082/R/posterior.R#L82 ... shall we remove that as well?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/66?email_source=notifications&email_token=AANXRRKZMGCB4OI24CSN233RFVV7FA5CNFSM4F7TSC5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENVEIZI#issuecomment-594166885, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRKPJ6DOSSXPNONXF6LRFVV7FANCNFSM4F7TSC5A .

KangchengHou commented 4 years ago

@gaow These make sense to me. one motivation for implementing this feature is that you can calculate the corresponding distribution of variance component estimates explained from the posterior samples, which could potentially be a faster approach to one described from https://www.biorxiv.org/content/10.1101/318618v1

For the unit test, should I start with some simulation and then testing whether difference between pip and mean of gamma samples is small enough?