Post Submission tasks - Githubissues

sampan501 commented 10 months ago

[ ] Add MGC and Adaptive Hsic (https://projecteuclid.org/journals/annals-of-statistics/volume-50/issue-2/Adaptive-test-of-independence-based-on-HSIC-measures/10.1214/21-AOS2129.short), and http://dx.doi.org/10.1093/biomet/asz024, as well as (http://dx.doi.org/10.1093/biomet/asz024)
[ ] Computational complexity figure supplement
[ ] Marron and Wald 1992 simulations for MIGHT, https://github.com/neurodata/mendseqs/issues/9
[ ] MVN simulations for Co-MIGHT
[ ] Fix MIGHT to subsample data per tree
[ ] Fix MIGHT test to use coleman method (but randomizing permutations per tree)
[ ] Add script for pulling data from public repository for real data analysis
[ ] Prove that a MIGHT statistic (e.g., S@98) from Variable Set 1 can be shown to be significantly different from the same statistic from Variable Set 2, even though the dimension of Variable Set 1 is far different from the number of dimensions in Variable Set 2 (need by the time we receive reviews from Science).
[x] Run dimension power curves for Figure 1 and Supplement for smaller sample size

adam2392 commented 10 months ago

The general MVN approach maybe can be done as Jovo suggested (w/ some open questions):

X_i | Y ~ MVN, where for CoMIGHT, we generate two such instances that are either directly dependent or not.

Y = mixture of MVN Gaussians, so the MI terms is then: $I(X1, X2; Y) = H(X1, X2) - H(X1, X2 | Y) = H(X1 | X2) + H(X2) - H(X1 | X2, Y) + H(X2 | Y)$

where the non-trivial parts to currently compute are:

H(X1 | X2) is unsure how to compute analytically, unless we numerically integrate?...
H(X1 | X2, Y) is the same

Maybe we generate a huge MVN first where we know the $\Sigma_{X1, X2}$ for the subset of variables we denote X1, X2, which is still MVN, and therefore we know H(X1, X2). Then, we use Y as the mixture of Gaussians w/ varying mixture probability?

adam2392 commented 10 months ago

Structuring the covariance in blocks as such and then using $Y \in [1, 2]$ to select the corresponding multivariate normal should allow us to:

arbitrarily apply feature-wise transformations for a specific class -> then there is a functional relationship between $X$ and $Y$.
compute analytical CMI and MI cuz we would have analytical solution for $H(X)$ and $H(X | Y) = H(X | Y=1) + H(X | Y=2) = H(X^{(1)}) + H(X^{(2)})$

Now I'm not 100% sure how to fit this in w/ the Marron/Wald

and

neurodata / treeple

Post Submission tasks #197