stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
169 stars 42 forks source link

Notion of "inferential statement" in the paper #202

Open garyzhubc opened 9 months ago

garyzhubc commented 9 months ago

The paper suggest:

"However, given sufficient data it should be possible to conclude that there are (at least) two effect variables, and that

$$(b_1\ne 0\text{ or }b_2\ne0)\text{ and }(b_3\ne0\text{ or }b_4\ne0)$$

Our goal, in short, is to provide methods that directly produce this kind of inferential statement."

Are we providing such statement without seeing it as a hypothesis to be tested? If not, what will be the uncertainty of such a statement? Is this given by the algorithm?

garyzhubc commented 9 months ago

Another question: Can we get such statement by constructing tests based on samples from the joint posterior, like for $N$ samples, calculating

$$\frac{1}{n}\sum_{n=1}^N\mathbb{1}\left(b^n\in{(b_1\ne0\text{ or }b_2\ne0)\text{ and }(b_3\ne0\text{ or }b_4\ne0)}\right)$$

If so, why do we still want to introduce the notion of credible sets?

pcarbo commented 9 months ago

@garyzhubc The idea is that the "credible set" (CS) corresponds to the event that a single variable X has an effect on the response variable Y. So with that constraint, the posterior inclusion probabilities (PIPs) are sufficient to quantify uncertainty in which variables affect Y. No other posterior statistic is needed. I hope that helps.

garyzhubc commented 9 months ago

So you're saying the uncertainty is quantified via PIP given CS as a fixed subset. But what about uncertainty in CS itself?

I'm thinking that maybe this is explicitly tackled in CS so I'm looking at the software right now as well as the paper. susie_get_cs has a parameter coverage = 0.95 by default. Is coverage the same as $\rho$ in 2.2 of the paper definition 1 of credible set?

If so, does it mean the uncertainty in the CS itself is by default 0.95? I noticed smaller parameter of coverage gives smaller subset sizes, but I was expecting bigger subset sizes for lower confidence of containment, so can you explain why the subsets sizes actually got smaller?

pcarbo commented 9 months ago

"A level-ρ Credible Set is defined to be a subset of variables that has probability >ρ of containing at least one effect variable."

So if ρ goes down, you will need fewer variables in your CS to satisfy the condition.

stephens999 commented 8 months ago

@garyzhubc it's great you have so many questions, but this venue is primarily for questions about the software and its usage. Also, you will generally get better answers to questions if you can make them more precise. If you have some specific questions about the software please post them here, but for the more open-ended methods questions you ask here I suggest you might be better to find someone local to you who is also interested in these methods to have discussions among yourselves to see if you can find the answers yourselves.

garyzhubc commented 8 months ago

So you're saying the uncertainty is quantified via PIP given CS as a fixed subset. But what about uncertainty in CS itself?

I'm thinking that maybe this is explicitly tackled in CS so I'm looking at the software right now as well as the paper. susie_get_cs has a parameter coverage = 0.95 by default. Is coverage the same as ρ in 2.2 of the paper definition 1 of credible set?

If so, does it mean the uncertainty in the CS itself is by default 0.95? I noticed smaller parameter of coverage gives smaller subset sizes, but I was expecting bigger subset sizes for lower confidence of containment, so can you explain why the subsets sizes actually got smaller?

Still a bit counter intuitive to me. In my understanding higher $\rho$ means lower specificity. If you are not sure which one among a set of SNP is causal versus you know a certain SNP is causal.

garyzhubc commented 8 months ago

@garyzhubc it's great you have so many questions, but this venue is primarily for questions about the software and its usage. Also, you will generally get better answers to questions if you can make them more precise. If you have some specific questions about the software please post them here, but for the more open-ended methods questions you ask here I suggest you might be better to find someone local to you who is also interested in these methods to have discussions among yourselves to see if you can find the answers yourselves.

Cool I'll ask around.

stephens999 commented 8 months ago

@garyzhubc a 95% confidence interval will be bigger than a 80% confidence interval. It is the same idea with confidence sets. (I'm not sure what you mean by uncertainty in the CS.)