Closed aksarkar closed 5 years ago
@aksarkar yes, though PIP for each effect makes sense.
> res$alpha
[,1] [,2] [,3] [,4] [,5]
[1,] 0.1875131 0.2290305 0.2017719 0.1762895 0.2053949
[2,] 0.1870604 0.2310171 0.2012803 0.1760361 0.2046060
[3,] 0.1865364 0.2326830 0.2009511 0.1756878 0.2041418
[4,] 0.1863538 0.2328922 0.2009752 0.1755439 0.2042350
[5,] 0.1866494 0.2315677 0.2012928 0.1757278 0.2047623
It is a case when the purity filter should have worked, that is:
> res = susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5)
> res$sets
$cs
NULL
$coverage
[1] 0.95
In other words the reported PIP should all be zero ... Previously it is a separate function that takes only index from CS to compute PIP. It is a new interface artifact when I combined susie_get_pip
inside the susie
call. I'll fix it.
BTW other approaches that "filters" L will also help, eg, the null_weight
will help, even for a very minor null weight:
> res = susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5, null_weight=0.5)
> res$pip
[1] 0.2828823 0.3588216 0.3345083 0.3973354 0.2260887 0.9926927
> res$null_index
[1] 6
(the last PIP is from the NULL component.)
Or simply let it esitmate prior variance,
> res = susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5, estimate_prior_variance=T)
> res$V
[1] 0 0 0 0 0
But we have not formally implemented something to prune from these V
estimates. Our paper recommends "purity" (absolute minimum correlation) filter, which is by default 0.5 in the software implementation.
I'll fix that interface bug now.
@aksarkar So with the patch above,
> set.seed(1)
> susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5)$pip
[1] 0 0 0 0 0
Does this make sense? This is what I did for the numerical studies in the paper anyways.
@stephens999 in our PIP calculation section in the manuscript we did not factor in the CS consideration -- our summation was from l = 1 to L. Maybe we want to change that.
Under the spike-and-slab prior, I would expect the PIP to be approximately 1 / p. But under the susie variational approximation, the PIP depends on the number of assumed effects L (i.e. number of variational parameters for the causal indicator variable).
@aksarkar This is what I would expect. What is the issue you are reporting?
@gaow Would it be possible to compute a likelihood or Bayes factor for different values of L in order to select L?
@pcarbo yes we do report that. For the example above,
> exp(res$lbf)
[1] 0.2180514 0.2180175 0.2181433 0.2182756 0.2182854
>
but we do not use it for selecting L. At least in my view the purity filter is more intuitive and the threshold there is easier to interpret than BF cutoffs.
@pcarbo In the following case, susieR
does not report 1 / p:
> susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5)$pip
[1] 0.5864162 0.5968658 0.7780974 0.6788985 0.6898738
AIUI, this comes from the fact that we estimate PIPj = 1 - prod{l=1}^L (1 - alpha_lj)
I just installed susieR
version 0.6.2.0395 from GitHub and I got all zeros:
susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5)$pip
# [1] 0 0 0 0 0
Did I do something wrong?
@pcarbo I was on 0.6.2.0394
. I think @gaow already pushed the fix he described above.
I don't want to prune before pip calculation by default
The pips reported by Susie are reasonable under the Susie model a I think. They should not be 0 under the model
1/p would only be approximately correct under assumption L=1. For spike and slab the pips will depend on prior hyperparameters. We could discuss further in person. But the key point is that we have a model, and the pips we report are computed under that model. If we prune the CS before report it is less clear what the pips represent.
I don't see any problem with pips depending on L. They similarly depend on pi0 in spike and slab...
I don't see any problem with pips depending on L. They similarly depend on pi0 in spike and slab...
I think the distinction maybe L is not a prior, but some instruments we put for the computation, and how we interpret it after. Previously susie_get_pip
was not built to the SuSiE model fitting function (susieR::susie
), and the model fitting part correctly reports those alpha
as 1/p (shown in my first post). The susie_get_pip
used to be a separate function that we report PIP in light of our belief on how many L we should report. To me, PIP = 0 (which implies our "estimated" L = 0) is informative if I understand how it happened ... We can discuss this behavior for the software, and in other words, how to best communicate this to the users.
@stephens999 just to make it clear that in our numerical studies I did prune before computing PIP.
Can you redo without pruning?
On Wed, Dec 5, 2018, 13:09 gaow <notifications@github.com wrote:
I don't see any problem with pips depending on L. They similarly depend on pi0 in spike and slab...
I think the distinction maybe L is not a prior, but some instruments we put for the computation, and how we interpret it after. Previously susie_get_pip was not built to the SuSiE model fit, and the model fitting part correctly reports those alpha as 1/p (shown in my first post). The susie_get_pip used to be a separate function that we report PIP in light of our belief on how many L we should report.To me, PIP = 0 is informative if I understand how it happened ... We can discuss this behavior for the software, and in other words, how to best communicate this to the users.
@stephens999 https://github.com/stephens999 just to make it clear that in our numerical studies I did prune before computing PIP.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/76#issuecomment-444585074, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt4xdZDQwsLVW4ecgua0aJX3fxP7H6jks5u2AvIgaJpZM4ZC_dl .
Sure, although in our settings I doubt it will make any difference.
I agree. That's why to keep things simple.
On Wed, Dec 5, 2018, 16:05 gaow <notifications@github.com wrote:
Sure, although in our settings I doubt it will make any difference.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/76#issuecomment-444647583, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt4xXjvfM5roJTCoaC8gIl8-bGAshTOks5u2DUQgaJpZM4ZC_dl .
Okay I've reverted the change for now -- we can discuss more later:
> set.seed(1); dat = susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5)
> dat$pip
[1] 0.6955438 0.7137817 0.6906197 0.6219080 0.6316382
> dat$sets
$cs
NULL
$coverage
[1] 0.95
Notice that:
> set.seed(1); dat = susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5, min_abs_cor = 0)
> dat$sets
$cs
$cs$L1
[1] 1 2 3 4 5
$purity
min.abs.corr mean.abs.corr median.abs.corr
L1 0.006593154 0.226734 0.03793143
$cs_index
[1] 1
$coverage
[1] 0.95
(as discussed offline we can imaging overlapping CS will be a bit tricker).
null_weight
will in fact change the resulting PIP without pruning:> set.seed(1); susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5, null_weight=0.1)$pip
[1] 0.4650484 0.4883830 0.4563233 0.3698489 0.3813200
> set.seed(1); susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5, null_weight=0.5)$pip
[1] 0.14815783 0.16344121 0.13957722 0.09120890 0.09645865
> set.seed(1); susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5, null_weight=0.8)$pip
[1] 0.04789028 0.05426298 0.04385576 0.02624462 0.02796934
> set.seed(1); susieR::susie(matrix(rnorm(500 * 5), nrow=500), rnorm(500), L=5, null_weight=0.99)$pip
[1] 0.002172408 0.002503407 0.001952651 0.001119151 0.001196372
If none of the predictors have a detectable effect, then the PIP reported by
susieR
does not match what we might expect for L > 1.Under the spike-and-slab prior, I would expect the PIP to be approximately 1 / p. But under the susie variational approximation, the PIP depends on the number of assumed effects L (i.e. number of variational parameters for the causal indicator variable).
We found this applying
susieR
to a variable selection problem where we didn't pre-screen using marginal association statistics (as in genetic fine mapping). We just wanted to solve multiple regression with possibly correlated predictors.Minimal example:
Revision
8a4f717