Closed MansMeg closed 4 years ago
Merging #79 into master will decrease coverage by
5.05%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #79 +/- ##
==========================================
- Coverage 99.31% 94.25% -5.06%
==========================================
Files 5 17 +12
Lines 145 418 +273
==========================================
+ Hits 144 394 +250
- Misses 1 24 +23
Impacted Files | Coverage Δ | |
---|---|---|
python/src/posteriordb/posterior.py | 100% <0%> (ø) |
:arrow_up: |
python/src/posteriordb/posterior_database.py | 100% <0%> (ø) |
:arrow_up: |
python/src/posteriordb/model.py | 100% <0%> (ø) |
:arrow_up: |
python/src/posteriordb/__init__.py | 100% <0%> (ø) |
:arrow_up: |
rpackage/R/gold_standard.R | 100% <0%> (ø) |
|
rpackage/R/data_info.R | 100% <0%> (ø) |
|
rpackage/R/utils.R | 88.88% <0%> (ø) |
|
rpackage/R/posterior_fit.R | 100% <0%> (ø) |
|
rpackage/R/utils_tests.R | 80% <0%> (ø) |
|
rpackage/R/posterior.R | 100% <0%> (ø) |
|
... and 7 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 6b2adfa...6718709. Read the comment docs.
@paul-buerkner and @avehtari It would be great to get your comments on the documentation.
Thanks, Paul. Any thoughts on the gold standard definition?
I have looked at the gold standard doc again and have a few comments:
For what purpose do we have an upper bound of the effective samples size? Wouldn't it be sufficient to provide a lower bound?
We want almost independent draws. If effective sample size is larger than the nominal, then there is dependency which makes further analysis more difficult.
In addition to divergent transititions, there should be no draws exceeding the maximum treedepth
No. Exceeding max treedepth doesn't invalidate the Markov chain, it just indicates potential performance issues, but even then limiting max treedepth to lower values can gives us improved ESS per log density evaluation.
Ok thanks for the clarifications!
Great! This is great! I'm currently fixing the gold standards to conform to this, by including both chains when stan_sampling has been used and
Also. We would like to flag funnel posteriors and multimodal posteriors, and maybe even convex posteriors. But the best would be to just flag this from the posterior samples rather than manually annotating them. My guess is that it should be possible to estimate funnels based on 10 000 samples? The same with multimodal posteriors? What do you say?
If we have no divergent transitions, then chances are we have captured the funnel if the chains ever came remotely close the the funnel.
For multimodality there is not quanrantee that we capture all the modes. It might as well be that we simply missed some modes. Ideally, we should work with multimodal distribution where we understand how many modes we have and as such can check if we found all of them. But in a lot of cases multimodality indicates some way of model misspecification (for instance if we forgot to identify mixture compoenents).
Alright, changes added. I'll merge later today if no one has any comments.
We would like to flag funnel posteriors and multimodal posteriors, and maybe even convex posteriors. But the best would be to just flag this from the posterior samples rather than manually annotating them. My guess is that it should be possible to estimate funnels based on 10 000 samples? The same with multimodal posteriors? What do you say?
Could we use the multimodality/funnel information to help validate the gold standards? So for example we know that some posterior should have 3 distinct modes. However the estimate says that there are only two, this might mean that the samples are not a valid gold standard (or it can mean that the estimate is wrong). Lets say we have 100 manual annotations, if we can catch 1 bad gold standard with them it would probably be worth the effort.
Of course this requires that we actually know the true number/location of modes etc so I don't know how useful this would be in practice.
No, we do not need this. I spoke with Aki and he thought it would be good to have this functionality. Although - only as suggestions. We want humans to add all keywords manually.
Here are the documentation after today's discussion. Would be great with your comments @avehtari and @paul-buerkner . Especially regarding the gold standard definition.