jean997 / cause

R package for CAUSE
https://jean997.github.io/cause/
53 stars 15 forks source link

Some Pareto k diagnostic values are too high #16

Closed tom-a-bond closed 3 years ago

tom-a-bond commented 3 years ago

Hi, I get the warning below. Following the suggested help call implies this is an issue with pareto smoothed importance sampling from the package loo, but I'm unclear what his means for my CAUSE results- any ideas?

> res <- cause(X = x, variants = keep_snps, param_ests = params)
Estimating CAUSE posteriors using  1169  variants.
Warning message:
Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

> summary(res)
p-value testing that causal model is a better fit:  1
Posterior medians and  95 % credible intervals:
     model     gamma             eta               q
[1,] "Sharing" NA                "0 (-0.04, 0.04)" "0.05 (0, 0.25)"
[2,] "Causal"  "0 (-0.01, 0.01)" "0 (-0.04, 0.04)" "0.06 (0, 0.28)"
> res$elpd
   model1  model2 delta_elpd se_delta_elpd        z
1    null sharing  0.2397902     0.2093987 1.145137
2    null  causal  1.5840173     0.5214589 3.037665
3 sharing  causal  1.3442272     0.4088632 3.287718
jean997 commented 3 years ago

Hi can you share the output of res$loos[[2]] and res$loos[[3]]? This happens some times as part of the estimation of the elpd. I think that generally you don't have to worry about it unless it really looks like something has gone wrong.

jean997 commented 3 years ago

I added a little section in the tutorial about the Pareto k warning. Hopefully that helps.

tom-a-bond commented 3 years ago

Thanks, this is really helpful. The loos output is below, only 9/1169 variants are in the ok/bad range, am I right that this means we are probably fine in this case? And are you able to comment on roughly what % of variants would need to be in the ok/bad range before you would say we should start worrying?

print(res$loos[[2]])
Computed from 1000 by 1169 log-likelihood matrix

         Estimate    SE
elpd_loo   2626.0  61.7
p_loo         0.3   0.2
looic     -5252.0 123.3
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.

print(res$loos[[3]])
Computed from 1000 by 1169 log-likelihood matrix

         Estimate    SE
elpd_loo   2624.8  61.7
p_loo         1.6   0.5
looic     -5249.6 123.4
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     1160  99.2%   600
 (0.5, 0.7]   (ok)          8   0.7%   544
   (0.7, 1]   (bad)         1   0.1%   819
   (1, Inf)   (very bad)    0   0.0%   <NA>
See help('pareto-k-diagnostic') for details.