Closed adamhaber closed 2 years ago
Yep, that's the right part of the code.
Regarding the choices for how to implement each of these: I had been thinking that it would be best to match Stan behavior and characteristics fairly closely. So finding which Stan Math functions are called and looking over the doc or code and even experiments comparing underlying Stan Math function calls to TFP function calls to make sure would be a good idea. I haven't done the necessary legwork for these yet so I can't promise that they are correct.
Let's take the LowerUpper as an example. You can see that the prefix for LowerUpper is "lub" here: https://github.com/stan-dev/stanc3/blob/master/src/frontend/Ast_to_Mir.ml#L173. This will get appended to "_constrain" and "_unconstrain" and looked up in Stan Math. So the constrain version is here: https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/fun/lub_constrain.hpp
So we can see there that it does a bunch of different, annoyingly complicated stuff depending on the lower and upper bounds and x.
Now I'm wondering - is it worth reproducing this? I assume most of that is in there for a reason and that doing something simpler would lose out on the painfully acquired years of knowledge wrapped up in Stan. Plus we'd like to run backend benchmarks and it would be nice if the models were computing the same thing.
I definitely agree that staying as close as possible to Stan's implementation is a good idea. I have no idea if there are any cases where the Stan Math library transforms variables in a way that's not currently implementable by TFP... I guess that at least the simple cases it should be OK. For example, the lub case isn't really complicated, they simply separate to exactly the same cases - identity, just lower, just upper, and both; in each of them the transformation is quite simple and probably "implementable" in TFP.
A somewhat more technical question - in | Lower lb -> [("Exp", []); ("AffineScalar", [lb])]
, the type of lb
is Expr.Typed.t
. However, in order to implement Upper ub
we'll need to pass two arguments to tfb.AffineScalar
- the shift ub
and the scale -1. My question is - how do I convert -1 (or any other arithmetic quantity for that matter) to be of type Expr.Typed.t
?
An interesting idea would be to implement these transforms in Stan language itself and start creating the Stan-in-Stan standard library... then we could compile those to custom bijectors in TFP...
there's a helper function for ints specifically: Expr.Helpers.int
Regarding the helper function - would it be OK to add
val float : float -> Typed.t
to Expr.mli
, and
let float i =
{Fixed.meta= Typed.Meta.empty; pattern= Lit (Real, string_of_float i)}
to Expr.ml
? And then
| Upper ub -> [("Exp", []); ("AffineScalar", [ub; Expr.Helpers.float (-1.)])]
This should be identical to the Stan Math implementation, f(x) = U - \exp(x)
,
and
| LowerUpper (lb, ub) -> [("Sigmoid", []);
("AffineScalar", [lb; Expr.Helpers.binop ub Operator.Minus lb])]
which should be almost identical to the Stan Math implementation, f(x) = L + (U - L) \mbox{logit}^{-1}(x)
. Almost identical, because the Stan Math code contains simple numerical checks (preventing f from returning exactly 1 or 0); I'll check how/if we can reproduce this behaviour with TFP.
Another thing I'm not sure how to implement (in TFP) is constraints of the form:
parameters {
real a;
real<lower = a> b; // enforces a < b
}
What do you think? I'll go over the Stan manual and look for other interesting cases such as this one...
Yep float sounds good. And yeah some of those will be tricky; for now it’s fine to stub them out and raise an exception when we see a case we haven’t figured out how to support yet.
On Thu, Nov 7, 2019 at 01:39 Adam Haber notifications@github.com wrote:
Another thing I'm not sure how to implement in TFP is constraints of the form:
parameters { real a; real
b; // enforces a < b } What do you think? I'll go over the Stan manual and look for other interesting cases such as this one...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/372?email_source=notifications&email_token=AAGET3DXBCO63IW7FXDGBXTQSOZZVA5CNFSM4JJUPRZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDLAQSI#issuecomment-550897737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGET3FK4P7DPHB2JRYZINTQSOZZVANCNFSM4JJUPRZQ .
Chaining the affine bijectors would be awesome. The question's then how to reflect that in the language. If we want an affine transform followed by a positive-constraining variable (for example, for a lognormal hierarchical prior, we could add biejctor specs left-to-right:
vector<offset = mu, multiplier = sigma><lower = 0>[N] alpha;
...
alpha ~ lognormal(mu, sigma);
Or we could do it the other way around.
vector<lower = 0><offset = mu, multiplier = sigma>[N] alpha;
In some ways this is more natural thinking of vector<lower = 0>
as a unit.
Another thing I'm not sure how to implement (in TFP) is constraints of the form...
We allow constraints to depend on other parameters. It complicates extracting constrained values because we need the value of all the parameters as they might play a role in transforms (aka bijectors).
The other case that might not work with bijects is unit_vector
as it has a constraint that's not bijective---it's controlled by a prior.
Regarding the choices for how to implement each of these: I had been thinking that it would be best to match Stan behavior and characteristics fairly closely. So finding which Stan Math functions are called and looking over the doc
The reference manual documents all the transforms in a dedicated chapter. That might help understand the code, which is, of course, the final arbiter of how it's actually being done.
So the constrain version is here: https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/fun/lub_constrain.hpp
These should all have custom derivatives in rev to reduce all the virtual function calls, but I don't think they do.
So we can see there that it does a bunch of different, annoyingly complicated stuff depending on the lower and upper bounds and x.
Now I'm wondering - is it worth reproducing this?
Reproducing what? Allowing constraints to depend on parameters?
Not implementing these things forces the users who want to do it to write their own transforms and Jacobian adjustment.
I assume most of that is in there for a reason
Feel free to ask on a case-by-case basis. I can probably explain why things are there, or at least what I was thinking when we added all the transforms.
and that doing something simpler would lose out on the painfully acquired years of knowledge wrapped up in Stan. Plus we'd like to run backend benchmarks and it would be nice if the models were computing the same thing.
I think compile-time errors or run-tim exceptions would be better than differing behavior.
Reproducing what? Allowing constraints to depend on parameters?
I was wondering aloud if the first version of the TFP backend needed to be coded to run the exact same transformations that Stan does with Stan Math, given it will be numerically quite different in many other ways and that TF will have its own numerical issues to work around (e.g. Matt Hoffman found a recent mismatch between tf's log and its derivative that made him switch to another constraining function pair).
Feel free to ask on a case-by-case basis. I can probably explain why things are there, or at least what I was thinking when we added all the transforms.
Sweet :D This will be very helpful indeed, thanks.
To implement transforms, we'd need to allow a
jacobian += ...
statement in transformed parameters. Then we could recode
parameters {
real<lower = 0> alpha;
...
as
parameters {
real alpha_unc;
...
transformed parameters {
real alpha = exp(alpha_unc);
log_jacobian += alpha_unc;
...
Without that, we'd have to use
target += alpha_unc;
in the model block and then it wouldn't work the way built-in type does w.r.t. optimization (no Jacobian) or MCMC (yes Jacobian).
I wouldn't think you'd need to use the exact same transforms as Stan. As long as the Jacobians are right, alternatives should be OK. I've thought it would be nice to explore some alternative transforms to what we use now in some cases. There's been a lot of discussion around the difficult unit_vector
case and also around simplex
.
Not sure if this the right issue to leave this note on, but I didn't think filing a separate one was necessary. I noticed that the bijectors generated for simple <lower, upper>
-bounded parameters get tfb.Chain
-ed together via the shift, scale, then sigmoid bijectors. There's a note on the tfb.Sigmoid
documentation saying it's more numerically stable to do such a bijection using the low
and high
params to that bijector directly, rather than do this chaining. Was there a reason to do it manually?
I probably missed the comment about numerical stability - will have a look. Thanks!
In Stan internally, the constraining transform for lower = L, upper = U
from unconstrained y
to constrained x
is
x = L + (U - L) * inv_logit(y)
Is there a more stable way to compute that? The partials are just computed as follows (where inv_logit(-y) = 1 - inv_logit(y)
:
d.x/d.L = inv_logit(-y)
d.x/d.U = inv_logit(y)
d.x/d.y = (U - L) * inv_logit(y) * inv_logit(-y)
See @axch's answer in the issue above. AFAIU, it makes sense to leave Stan's bijector as it is, but change the code generated by the TFP backend to be more consistent with TFP's "best practices" (so low and high instead of chain+scale+shift).
I'll fix the sigmoid parameterisation; other than that, and considering the fact that adding more bijectors and distributions is more of an "ongoing effort", I think we can close this.
I agree with @adamhaber that we can close this and can open specific issues for additional bijectors when needed.
Hi,
I'm interested in adding support for more bijectors in the TFP backend; as far as I understand, the current code supports lower-bounded variables, right? Does that cover "automatically" variances and standard deviations?
I think easiest bijectors to add are probably:
tfb.CorrelationCholesky
to variables of typecholesky_factor_corr
tfb.Chain([tfb.CholeskyOuterProduct(), tfb.CorrelationCholesky()]
to variables of typecorr_matrix
.Other options:
-1*x+0
affine bijector after the chained exp+affine for lower bounded variables would be a good option for upper bounded?tfb.sigmoid
for both lower- and upper-bounded variables?Code wise, just to be sure - that means adding more patterns in this part of the code, right?