mjhajharia / transforms

2 stars 1 forks source link

three more simplex transforms to evaluate #59

Open bob-carpenter opened 1 year ago

bob-carpenter commented 1 year ago

Suggestions from @mortonjt on three more transforms to consider with notes on who volunteered to look at them:

mortonjt commented 1 year ago

The weighted ilr transform was originally defined here : https://www.ajs.or.at/index.php/ajs/article/view/vol45-4-2 It can be useful when incorporating additional structured information (i.e. phylogeny) -- but I haven't seen benchmarks comparing weighted vs unweighted ilr transform yet.

sethaxen commented 1 year ago

So here's my initial impression of the "L2 projection" method. They call it sparsemax. It's a projection from N-dimensional vectors to the simplex of length N-vectors, so it's non-bijective. It's expressly designed to threshold many values of the vector to 0, i.e. to drive toward sub-simplices of lower dimension embedded within the target simplex. Let $s_i = 0$ if the element at index $i$ is thresholded to 0, and otherwise $s_i = 1$. Then the Jacobian is $$J = \operatorname{diag}(s) - \frac{s s^T}{s^T 1}$$ So the Jacobian does not continuously vary with the values of the unconstrained parameters. By Sylvester's determinant theorem, $$|J| = |\operatorname{diag}(s)||1-s^T\operatorname{diag}(s)^{-1}s/(s^T 1)|$$ The left term guarantees this is 0 whenever $s \ne 1$, and for $s = 1$, $$|J| = |I_N| |1 - 1^T I_N 1/N| = N |1 - 1| = 0,$$ so we never get a non-zero determinant.

The only way we could get this to work for HMC would be with an augmented version of the transform. But I'm not certain this is sensible, since with augmentation we are effectively using the map ℝᴺ → Δᵈ × ℝᴺ⁻ᵈ⁻¹, where d discretely jumps with the values of the inputs. So when it changes, the prior put on the augmented outputs is suddenly being applied to a space of different dimension.

In summary, this transform can't be used for HMC "as-is", and I'm not certain whether even an augmented form is applicable to HMC, since it involves the dimensionality of subsets discretely varying wrt unconstrained parameters.

bob-carpenter commented 1 year ago

The goal of the weighted ILR transform is to transform genomic data in a phylogenetically-sensitive way so that it may be fed into standard statistical tools. I don't want to transform data toward standard parametric models---I want to transform simple parameterizations toward data with models of the observation and noise process.

The weighted ILR transform is a generalized reweighting (p. 12), that in our notation derives a simplex $y$ from unconstrained $x$ as

$y = \textrm{ilr}_p(y) = \textrm{clr}_p(x) \ \textrm{diag}(p) \ \Psi^{\top},$

where $p$ is a vector of weights for each taxa and $\Psi$ is a constant weighting matrix derived from a phylogenetic tree (the details of which are very combinatorially involved). They then add another twist in reweighting, but I don't think we need to evaluate anything this phylogenetic-specific.

mjhajharia commented 1 year ago

went through the pivot one, it's basically these tuning things with some theoretical justifications that ends up being - scaling log ratios for the orthonormal basis aitchison defines. it is also a special case of ILR - the idea behind pivot coordinates being - you want to capture the relative information about p out of n parts in p parts themselves. there's a more recent paper on the same as well - https://link.springer.com/article/10.1007/s11004-020-09862-5 and the idea of pivot coordinates has been there for a bit I think it's already existing in the compositions package in R.

Will update with some notes soon.