neurodata / graph-stats-book

http://docs.neurodata.io/graph-stats-book/coverpage.html
GNU General Public License v3.0
25 stars 9 forks source link

Add info about dcorr and mgc to twosample testing section #90

Closed ebridge2 closed 2 months ago

ebridge2 commented 3 months ago

This is done; please review @loftusa in "twosample.tex" at the annotated comment that I added for you to review and close if you agree it's done

loftusa commented 2 months ago

Done. Made minor edits for sentence structure, everything else seems fine. New section:

\begin{Conceptfloatingbox}\caption{Independence testing and two-sample testing}
\label{box:ch8:twosamp:ind_test}
A common scenario experienced in machine learning involves two random quantities, $\mathbf x_i$ and $\mathbf y_i$. We suspect that for each $i$, $\mathbf x_i$ and $\mathbf y_i$ are drawn in a manner that may be related. This idea is captured by a ``joint distribution'', meaning that $(\mathbf x_i, \mathbf y_i)$ for each $i$ are sampled from some distribution $F_{\mathbf x, \mathbf y}$. For example, a joint distribution could be the \textit{multivariate normal distribution}, where each dimension of a random vector follows a normal distribution. If we assume that the pair $(\mathbf x_i, \mathbf y_i)$ are multivariate normally distributed, then $\mathbf x_i$ is a $\mathcal N(\mu_x, \sigma_x)$ random variable, $\mathbf y_i$ is a $\mathcal N(\mu_y, \sigma_y)$ random variable, and $corr(\mathbf x_i, \mathbf y_i) = \rho$ is their correlation. 

A formulation of our question is known as the \textit{independence testing problem} and can be written:
\begin{align*}
    H_0 : F_{\mathbf x_i, \mathbf y_i} = F_{\mathbf x_i} F_{\mathbf y_i}\text{ against }H_A : F_{\mathbf x_i, \mathbf y_i} \neq F_{\mathbf x_i} F_{\mathbf y_i},
\end{align*}
which tests whether $\mathbf x_i$ and $\mathbf y_i$ are independent against the alternative they are not. The independence testing problem assumes nothing about the distribution or nature of $\mathbf x_i$ or $\mathbf y_i$ (they could be either random variables or vectors, for instance). If $\mathbf y_i$ takes one of two possible values, the two-sample test can be reformulated as an independence test. Thus, independence tests can serve as an alternative for two-sample tests.
\end{Conceptfloatingbox}