Open hyunjimoon opened 3 years ago
@martinmodrak Do you have any intuition why lp__
is the most sensitive to the diagnostics? I wish to justify this as then we can primarily inspect ecdf plot of this one quantity before of going though the p
number of SBC ecdf plots where p
is the number of parameters (>2000 for instance).
I don't have strong intuition - just that it is a non-linear function of all model paramters, so even small changes to one or a couple of parameters can add up to big changes in lp__
. I think the examples I am building for the paper (currently not in the paper) provide some limited empirical evidence that this is the case.
However, if we are just worried about the number of plots to inspect, I think being able to filter and see only the "most non-uniform" plots is probably the way to go, rather then relying on a single quantity. Where "most non-uniform" could be either p-value from one sample Kolmogorov-Smirnov test or the gamma
value from Teemu's paper on vizualizations (Equation 13 in https://link.springer.com/content/pdf/10.1007/s11222-022-10090-6.pdf) or something similar
the examples I am building for the paper (currently not in the paper) provide some limited empirical evidence that this is the case.
Could you share this in more detail, keywords at least? Are you sayinglp__
i.e. log p(theta, y) + C is either non-monotone or many-to-one mapping?
when lp__
fail to detect problem
lp__
are not exchangeable w.r.t data
Preprocess tools that sort which parameter's rank plot to inspect first
"high-dimensional models there are often only a few parameters/summaries of interest to the final application, and SBC is much more productive when those parameters are prioritized instead of trying to test every parameter at once." From here