mjhajharia / transforms

2 stars 1 forks source link

Add more transforms for simplex #42

Closed sethaxen closed 1 year ago

sethaxen commented 2 years ago

This adds the following to transforms for simplex, described in #41:

sethaxen commented 2 years ago

For large $N$, e.g. $N=1000$, both of these transforms seem to have problems with initialization. My understanding is Stan initializes uniformly between [-2, 2] in unconstrained space. For $N=500$, the following shows the interval containing 99% marginal posterior intervals for the uniform distribution in the unconstrained space vs this initialization range. In general they don't overlap well. Maybe this isn't an issue. But the high index parameters have different posterior scales than the low index ones, and perhaps this is a challenge for initialization.

Hypersphere (where $y_i = \operatorname{logit}(\phi_i \frac{2}{\pi})$ ):

tmp_hypersphere

Logistic product:

tmp_logistic

sethaxen commented 2 years ago

Comparing with stick-breaking and softmax, which both align much better to Stan's initialization and have more uniform scales in unconstrained space

stick-breaking: tmp_stick

softmax: tmp_softmax

Perhaps there's a simple reparameterization that improves the geometry here.

bob-carpenter commented 2 years ago

That's right. Stan uses uniform(-2, 2) inits in the unconstrained space. You can specify that bound. One of the things I've wanted to do is evaluate tail numerical stability. What if we move that to +/- 10 or +/- 100 or even 1000?

I shifted Stan's stick breaking prior do that a vector of zeros would initialize to the uniform distribution. Is there a way of doing that for the other parameterizations?

When you're talking about coverage, is that for the uniform distribution over simplexes? What about other simple dirichlet like dirichlet(0.1) or dirichlet(10)? I really like the idea of measuring tail coverage like this. It will complement measuring leapfrog steps to bulk of distribution, which is very sampler and implementation-dependent. In retrospect, I really wish we'd just chosen normal(0, 1) initializations in Stan version 1.0---those would line up perfectly with standardized posteriors.

Edit: We can emphasize the stability transforms like this in the write-up. It's not even so much about Stan's initialization as having something that's roughly standardized in unconstrained space for a uniform distribution. I don't know how to translate that into unbounded things like covariance matrices.

mjhajharia commented 1 year ago

That's right. Stan uses uniform(-2, 2) inits in the unconstrained space. You can specify that bound. One of the things I've wanted to do is evaluate tail numerical stability. What if we move that to +/- 10 or +/- 100 or even 1000?

ok even -10,10 fails and only things very close to 0 seem to work.

sethaxen commented 1 year ago

With the "logistic product" implementation in this PR, for large N, it failed for me often. But not after the update to "hyperspherical logit" in #55. Are you using the version in this PR or in #55?

mjhajharia commented 1 year ago

With the "logistic product" implementation in this PR, for large N, it failed for me often. But not after the update to "hyperspherical logit" in #55. Are you using the version in this PR or in #55?

yeah HypersphericalLogit.stan does work, i was just trying different inits on the previous one before discarding it. and yeah you were right about large N = 1000, all the other 6 combinations of parametrizations do end up sampling 1000 times without failing