Inspection priority for High dimension

hyunjimoon commented 3 years ago

Preprocess tools that sort which parameter's rank plot to inspect first

"high-dimensional models there are often only a few parameters/summaries of interest to the final application, and SBC is much more productive when those parameters are prioritized instead of trying to test every parameter at once." From here

hyunjimoon commented 2 years ago

@martinmodrak Do you have any intuition why lp__ is the most sensitive to the diagnostics? I wish to justify this as then we can primarily inspect ecdf plot of this one quantity before of going though the p number of SBC ecdf plots where p is the number of parameters (>2000 for instance).

martinmodrak commented 2 years ago

I don't have strong intuition - just that it is a non-linear function of all model paramters, so even small changes to one or a couple of parameters can add up to big changes in lp__. I think the examples I am building for the paper (currently not in the paper) provide some limited empirical evidence that this is the case.

However, if we are just worried about the number of plots to inspect, I think being able to filter and see only the "most non-uniform" plots is probably the way to go, rather then relying on a single quantity. Where "most non-uniform" could be either p-value from one sample Kolmogorov-Smirnov test or the gamma value from Teemu's paper on vizualizations (Equation 13 in https://link.springer.com/content/pdf/10.1007/s11222-022-10090-6.pdf) or something similar

hyunjimoon commented 2 years ago

the examples I am building for the paper (currently not in the paper) provide some limited empirical evidence that this is the case.

Could you share this in more detail, keywords at least? Are you sayinglp__ i.e. log p(theta, y) + C is either non-monotone or many-to-one mapping?

when lp__ fail to detect problem

reuse stan program
missing data
when lp__ are not exchangeable w.r.t data

hyunjimoon / SBC

Inspection priority for High dimension #4