friedman & schilling - Githubissues

jovo commented 7 years ago

rereading those papers,

friedman 1983 writes:

"...the size of the graph is determined by the choice of K. The best choice will depend upon the particular situation and there are, as yet, no specific guidelines. As the same size becomes large, it is unliklely that the best choice would involve having both graphs sparse; that is, both e_x & e_y growing linearly with sample size. Similarly, it is unlikely that both graphs should be dense, e_x and e_y growing at N^2. Choices between these two extremes are likely to be best. For example, the Wald-Wolfowitz run test takes one graph (sample identity) to be dense and the other (MST) to be sparse."

schilling 86 seems to prove asymptotic consistency for any K (theorem 3.4). they also point out:

"The primary computational task is the identification of the k nearest neighbors of each sample point. This can be accomplished in O(kn log n) steps....; it should be noted, however, that computational time also grows rather significantly with d"

of course, they also do not provide a means to estimate the optimal scales at all, much less do so in a computationally efficient and statistically significant way.

@cshen6 no action required, just fyi.

cshen6 commented 7 years ago

👍 yeah! thanks for the refreshing here! indeed our work aligns very well with those early works!

jovo commented 7 years ago

but does this mean that all scales are consistent? if so, isn't our proof for thm 1 done by them?

cshen6 commented 7 years ago

no, at least not for MGC based on dcorr.

For one thing, what they define (by friedman or schilling) are structurally different from global dcorr and local dcorr. So their proof cannot be carried over directly.

For another, although independence test can be used for two-sample test, the other way around is not. (i.e., the consistency of schilling is for two-sample test, but not the independence test)

But I will re-read them and the proofs, there are probably some new insights we can glean at this stage :-)

neurodata / MGC-paper

friedman & schilling #185