Open mjskay opened 10 months ago
Attention: Patch coverage is 98.89299%
with 3 lines
in your changes are missing coverage. Please review.
Project coverage is 95.80%. Comparing base (
c312846
) to head (88fa83d
). Report is 12 commits behind head on master.:exclamation: Current head 88fa83d differs from pull request most recent head 1079cef. Consider uploading reports for the commit 1079cef to get more accurate results
Files | Patch % | Lines |
---|---|---|
R/rvar-.R | 94.87% | 2 Missing :warning: |
R/weighted.R | 98.07% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This is how benchmark results would change (along with a 95% confidence interval in relative change) if bdef35c34867a28c7e956da43881bffde8bda583 is merged into master:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 8c803b8d6eb92d3c521547cd90917d32fd9aa5de is merged into master:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if cc2cb23ca0f0065665b2a0b1f2279d020df642cf is merged into master:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 1c1c4b7645fa4e56332a23089e9741764f8406cb is merged into master:
I don't know if any of the other functions in R/convergence.R should be modified for weighted rvars.
Currently, everything else than pareto_
functions assume non-weighted MCMC. I have so far assumed that MCMC and weighting are independent of each other (there might be some less common algorithms that jointly sample parameter values and weights).
ess_
and mcse_
functionsrhat_
and rstar
could do the MCMC convergence check without weights (until we are aware of an algorithm that would have the weighting inside the MCMC already)pareto_
functions are checking the tail(s) of a given argument, and it has been used to check tails of raw weights/ratios (r or r(theta) in PSIS paper notation), function of a variable (h or h(theta)), or the product (hr). With the weight support, they could automatically make the diagnostics for r and hr (and if no weights then just h). Here I'm assuming that we almost always use self-normalization so that we need to check the normalization (E[r]) and the quantity of interest (E[hr])Pinging @n-kall , too
- If there are both (assumed) Markov dependency and weights, we could follow the approach presented in PSIS paper for
ess_
andmcse_
functions
For reference: Equations 6 (MCSE) and 7 (ESS) in preprint v6
pareto_
functions are checking the tail(s) of a given argument, and it has been used to check tails of raw weights/ratios (r or r(theta) in PSIS paper notation), function of a variable (h or h(theta)), or the product (hr). With the weight support, they could automatically make the diagnostics for r and hr (and if no weights then just h). Here I'm assuming that we almost always use self-normalization so that we need to check the normalization (E[r]) and the quantity of interest (E[hr])
Any thoughts on how the two sets of diagnostics should be presented in summarise_draws?
Would it make sense to have separate e.g. pareto_khat_quantity
, pareto_khat_weights
columns?
I'm currently working on updating the pareto_
, ess_
and mcse_
functions for weighted rvars in a fork
This is how benchmark results would change (along with a 95% confidence interval in relative change) if e52a6f93d1dc05bcc37a1826395ca3c85a9e410d is merged into master:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 052bb735cac2064426b1a645557cf2f6b29d4155 is merged into master:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if b59df5e21583e55e99a438c8e9d09c6780af23ec is merged into master:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if facc5864e908be28fc8ddebee86e9f1fd150358f is merged into master:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if a8ac96e340419a71418ea5303fd4d4a0f97e5f23 is merged into master:
Okay, I think this is ready for review, pending two things:
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 88fa83dacdb4039da9d18a0be57cced099fd92b2 is merged into master:
Re: option 2, I'll still need some more time to finish up the weighted mcse and ess. So I think it's better to merge without waiting for me
I am at a conference and then on vacation for the next two weeks. Can someone else review this PR?
Summary
This PR aims to address (at least part of) #184 by implementing weighted
rvar
s.Currently,
rvar
s cannot contain weights, and weighting of them can only be done by putting them in adraws_rvars
object that itself contains a".log_weight"
rvar
containing the weights. This leads to counterintuitive behavior, like the default output of thervar
(showing mean and sd) using unweighted versions of those statistics.This PR addresses that issue in the following ways:
rvar
weights as a"log_weights"
attribute directly on thervar
, just like the"nchains"
attribute is used to store chain count.draws_rvars
no longer use a".log_weight"
variable to store weights, instead storing them directly on eachrvar
they contain, and requiring allrvar
s they contain to have the same weights (the same way it handles"nchains"
).log_weights()
function fordraws
andrvar
s is added, which is a lower-level version ofweights(x, log = TRUE, normalize = FALSE)
that just returns the log weights stored in the object without transformation. I initially did not have this, but found it greatly eased programming with weights.weight_draws(x, NULL)
is now allowed as the canonical way to remove weights from adraws
object, sinceremove_variables(x, ".log_weight")
does not work ondraws_rvars
objects anymore.rvar
s have been updated to incorporate weights (with a couple of exceptions I haven't gotten to yet, see TODOs and Questions below).rvar
internals are becoming (even more) complicated, I have added an "rvar
Internals" section to?rvar
that hopefully will help in case others need to touch the code ;).Demo
You can't combine two rvars with different weights:
The check for equality of weights is done on the internal weights using
identical()
, which should be fast, especially in cases where the two weight vectors are actually pointers to the same vector in memory (in which case the comparison is constant time). This does mean the weights vectors must be exactly the same (no tolerance for floating point error), but I suspect in most cases when weighting happens the exact same weight vector is being applied to many objects. In any case, if someone did encounter this issue they could simply assign the log weights from object to the other.If one rvar is weighted and another is not, the weights of the weighted rvar are inherited, which I believe covers the use case of (weighted draws from some model) + (unweighted draws, e.g. used to simulate predictions):
If you install the dev version of {ggdist}:
It supports weighted rvars in all functions (densities, CDFs, quantiles, all interval types and all point summaries):
Without weights:
With weights:
Weights should work basically everywhere:
TODOs and Questions
TODOs:
density(<rvar>)
,cdf(<rvar>)
, andquantile(<rvar>)
/quantile2(<rvar>)
. The first two are straightforward. For weighted quantiles, I have an implementation in {ggdist} that I can port over, but I may want to update it first; some thoughts on weighted quantiles are here and feedback is welcome. (In fact, since writing that document my thinking has changed a bit---I originally thought the way I suggested implementing weighted quantiles in that document is an improvement on ggdist's current implementation, but after further investigation I might be leaning back towards how I did it in ggdist originally...).vignette("rvar")
Questions:
R/convergence.R
should be modified for weighted rvars. @avehtari?Would love for folks to kick the tires. I think once this is in we could also start thinking about what a successor to
summarise_draws()
might look like that supports weights (and solves the various other open issues onsummarise_draws()
).Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: