neurodata / treeple

Scikit-learn compatible decision trees beyond those offered in scikit-learn
https://treeple.ai
Other
66 stars 14 forks source link

Post Submission tasks #197

Open sampan501 opened 10 months ago

sampan501 commented 10 months ago
adam2392 commented 10 months ago

The general MVN approach maybe can be done as Jovo suggested (w/ some open questions):

X_i | Y ~ MVN, where for CoMIGHT, we generate two such instances that are either directly dependent or not.

Y = mixture of MVN Gaussians, so the MI terms is then: $I(X1, X2; Y) = H(X1, X2) - H(X1, X2 | Y) = H(X1 | X2) + H(X2) - H(X1 | X2, Y) + H(X2 | Y)$

where the non-trivial parts to currently compute are:

Maybe we generate a huge MVN first where we know the $\Sigma_{X1, X2}$ for the subset of variables we denote X1, X2, which is still MVN, and therefore we know H(X1, X2). Then, we use Y as the mixture of Gaussians w/ varying mixture probability?

adam2392 commented 10 months ago

Structuring the covariance in blocks as such and then using $Y \in [1, 2]$ to select the corresponding multivariate normal should allow us to:

  1. arbitrarily apply feature-wise transformations for a specific class -> then there is a functional relationship between $X$ and $Y$.
  2. compute analytical CMI and MI cuz we would have analytical solution for $H(X)$ and $H(X | Y) = H(X | Y=1) + H(X | Y=2) = H(X^{(1)}) + H(X^{(2)})$

Now I'm not 100% sure how to fit this in w/ the Marron/Wald

IMG_2247

and

IMG_2248