jean997 / cause

R package for CAUSE
https://jean997.github.io/cause/
52 stars 15 forks source link

parsing results #28

Closed rajanil closed 2 years ago

rajanil commented 2 years ago

Hi Jean, Thanks for developing this MR tool and the excellent documentation! I found it incredibly easy to setup and use.

Based on my understanding of the model, given V variants as input, the key results are posterior estimate of gamma (a scalar), q (a vector of size V) and a test statistic comparing the causal model vs the sharing model. I'm a bit lost: which objects in the result should I use to get the posteriors and statistics of interest? (The tutorial explained how to get these via plots and summaries, but I would like to programmatically access and store these values, to scale over many pairs of traits.)

thanks! -Anil

jean997 commented 2 years ago

Hi Anil, The easiest way to get it is via "res <- summary(obj)" where obj is a CAUSE object. If you do this, res will now contain objects res$p which gives you the p-value comparing the sharing and causal models and res$tab which gives a table of posterior means and credible intervals. You can change the size of the credible intervals with an argument ci.size to the summary command.

The other way to get the p-value is directly from the CAUSE object. obj$elpd gives the elpd comparisons between models. In this table, the sign of the z-score and delta elpd matters. The convention we use is that delta elpd is the difference elpd(model1) - elpd(model 2) so a negative value is in favor of model 2 and a positive value is in favor of model 1. Here model 1 and model 2 refer to columns of obj$elpd. The z-score on the sharing vs causal line of obj$elpd is used to generate the p-value in the summary object.

rajanil commented 2 years ago

Thanks, Jean! This was quite helpful! In the first approach, which variables do the entries in res$tab correspond to? I see 5 entries of posterior means and credible intervals. (I'm not too familiar with R and might have missed something obvious.)

rajanil commented 2 years ago

Hi Jean, Just bumping up this question again. In the first approach, which variables do the entries in res$tab correspond to? I see 5 entries of posterior means and credible intervals. (I'm not too familiar with R and might have missed something obvious.) Thanks!

jean997 commented 2 years ago

summary(obj)$tab should give you something like

summary(obj)$tab
     model     gamma               eta                   q                  
[1,] "Sharing" NA                  "0.39 (0.29, 0.5)"    "0.63 (0.43, 0.79)"
[2,] "Causal"  "0.32 (0.24, 0.39)" "-0.03 (-0.89, 0.66)" "0.03 (0, 0.25)"  

Where the top line is for the sharing model and the bottom line is for the causal model. The values in each cell give the posterior mean and credible interval. You can change the width of the CI with the argument to summary I mentioned above.

jean997 commented 2 years ago

You can get the p-value either from

summary(obj)$p
 [1] 9.98088e-06

or

obj$elpd
   model1  model2 delta_elpd se_delta_elpd         z
1    null sharing -30.135161      5.575439 -5.404985
2    null  causal -37.222330      7.112215 -5.233578
3 sharing  causal  -7.087169      1.661580 -4.265318

In the elpd table, the p-value comes from the one-sided z-score. The row we want is the third one because that compares the causal and the sharing models.

pnorm(obj$elpd$z[3])
[1] 9.98088e-06