Open garyzhubc opened 12 months ago
Not just sets from the susie_get_cs()
sorry I don't think we understand the question.
Can I test the hypothesis like "beta_2,beta_3\ne0, beta_1,beta_4=0"?
perhaps the simplest way would be to sample from the (approximate) posterior distribution on beta, and then use that sample to compute the probability of any particular combination of 0/non-zero beta values.
Something like this (test the hypothesis that significant variants are in the middle)?
posterior<-unlist(lapply(1:1000, function(i) (max(samp$b[40:60,i])>0 | min(samp$b[40:60,i])<0) & (samp$b[1,1:39] == 0 & samp$b[1,61:101] == 0)))
result in:
> mean(posterior)
[1] 0.9756098
I think the code doesn't look quite right, but something like that, yes
I made an edit on the code. Could you tell me why it doesn't look right?
you have samp$b[1,1:39] but samp$b[40:60,i]
so the indices of b don't look consistent through your code
Thanks for spotting this mistake. I'm now using:
> posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[40:60,i])>0 | min(samp$b[40:60,i])<0) & all(c(samp$b[1:39,i] == 0, samp$b[61:101,i] == 0))))
> mean(posterior)
[1] 0
> posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[1:39,i])>0 | min(samp$b[1:39,i])<0) & all(samp$b[40:101,i] == 0)))
> mean(posterior)
[1] 0
> posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[61:101,i])>0 | min(samp$b[61:101,i])<0) & all(samp$b[1:60,i] == 0)))
> mean(posterior)
[1] 0
which I believe is consistent. However, it looks like these all have very small probabilities. Do you recommend instead of testing all(samp$b[1:60,i] == 0)
try something like all(abs(samp$b[1:60,i]) < episilon)
, or do you think what I'm doing is okay?
Can anyone respond to this issue please? @pcarbo
The issue of working with very small probabilities is a common issue and there are some ways to help with this. If you can share with us a reproducible example illustrating exactly what you are trying to do, I might be able to help you.
As a general piece of advice, I recommend starting with an example that is as simple as possible, e.g., an example with exactly 4 variables X.
An example of four variables b1, b2, b3, b4: suppose I want to calculate P(b1=0 and b4 =0 and (b2!=0 or b3!=0)) as the probability that the causal variant is in {b2,b3}, shall I run this program below to get the probability?
samp<-susie_get_posterior_samples(res_, num_samples)
posterior23<-unlist(lapply(1:num_samples, function(i) (max(samp$b[2:3,i])>0 | min(samp$b[2:3,i])<0) & all(c(samp$b[1,i] == 0, samp$b[4,i] == 0))))
mean(posterior23)
If I rank posterior1, posterior2, posterior3, posterior12, posterior23, posterior13, posterior123 and select the one with probability greater than 0.95, will I get the same credible interval (default coverage = 0.95):
susie_get_cs(res_)
@garyzhubc This is not a reproducible example because some variables in your code (e.g., res_
) are not defined.
However, it looks like these all have very small probabilities. Do you recommend instead of testing all(samp$b[1:60,i] == 0) try something like all(abs(samp$b[1:60,i]) < episilon), or do you think what I'm doing is okay?
I think what you are doing looks OK, and you are just getting the answer that there is a very small probability of the event you are looking at. You could also try the epsilon approach you suggested.
I also tried using PIP directly instead of sampling.
prod(1-res$pip[1:34])*(1-prod(1-res$pip[35:68]))*prod(1-res$pip[69:101])
still gives zero probabilities. See https://github.com/stephenslab/susieR/issues/203#issuecomment-1773921285
@garyzhubc This is not a reproducible example because some variables in your code (e.g.,
res_
) are not defined.
I could do the same on this example https://stephenslab.github.io/susieR/articles/sparse_susie_eval.html:
create_sparsity_mat = function(sparsity, n, p) {
nonzero <- round(n*p*(1-sparsity))
nonzero.idx <- sample(n*p, nonzero)
mat <- numeric(n*p)
mat[nonzero.idx] <- 1
mat <- matrix(mat, nrow=n, ncol=p)
}
n <- 1000
p <- 1000
beta <- rep(0,p)
beta[c(1,300,400,1000)] <- 10
X.dense <- create_sparsity_mat(0.99,n,p)
X.sparse <- as(X.dense,"CsparseMatrix")
y <- c(X.dense %*% beta + rnorm(n))
susie.sparse <- susie(X.sparse,y)
Using Monte Carlo sample from posterior:
num_samples<-10000
samp<-susie_get_posterior_samples(susie.sparse, num_samples)
posterior<-unlist(lapply(1:num_samples, function(i) (max(samp$b[341:680,i])>0 | min(samp$b[341:680,i])<0) & all(c(samp$b[1:340,i] == 0, samp$b[681:1000,i] == 0))))
mean(posterior)
Using PIP:
prod(1-susie.sparse$pip[1:340])*(1-prod(1-susie.sparse$pip[341:680]))*prod(1-susie.sparse$pip[681:1000])
Both gives probability zero.
@garyzhubc I think the issue is that in your example all the inclusion probabilities are either 1 or very, very small, so it may be tricky to use a naive Monte Carlo sampling approach:
hist(log10(susie.sparse$alpha),n = 64)
One idea that comes to mind is importance sampling, but you might want to start with an example where the probabilities are less extreme.
Is it possible to test the hypothesis that the significant variants are in a set of SNPs?