Closed spinkney closed 10 months ago
I'm not sure what you mean by transforms written on the log scale. Is what you provided another way to genrate a simplex somewhere?
But yes, if there's a way to make the arithmetic more stable we should both do it and mention it in the paper.
Same as the probit transform just written keeping everything on the log scale. If we want to output the simplex then need to exponentiate log_x
.
So is the idea to sample a log simplex rather than a simplex? That would be useful in practice because we take the log again right away when we use a simplex in a multinomial. We have a multinomial-logit, but that could be factored a different way if we take in a log simplex (we'd skip subtracting log-sum-exp to normalize).
I think this makes sense. Whether we use simplex
or log_simplex
, the gradients on the unconstrained parameters will be mathematically identical; the only difference is that with log_simplex
they have the chance to be more numerically stable. This also allows us to separate out numerical issues coming from that particular Dirichlet just being a bad thing to sample (e.g. Dirichlet(alpha=ones(10^3) * 1e-3)
) from issues due to the transformation itself.
Perhaps we could do this easily by having each model compute both x
and logx
for sampling, but then for evaluation we can swap out the target distribution to either be one that uses logx
or one that uses x
.
@spinkney the probit product transform seems to be missing from the paper. Can you add it to at least the appendix?
I'll add it to the appendix as it's similar to the logit
I was doing a bit of testing and for high dimensional values of the simplex the sampler often fails. A separate issue is that the
dirichlet
distribution with these high dimensional vectors.For example, the probit product transform fails for dimensions > 1000 (maybe even > 800) but the log probit product doesn't.
Do we want to include tests with the transforms written on the log-scale?