two-sample test & non-parametric "n-sample" test

neurodata / MGC-paper

MGC: multiscale graph correlation (pronounced "magic")

http://neurodata.io

Apache License 2.0

2 stars 11 forks source link

two-sample test & non-parametric "n-sample" test #120

Closed jovo closed 8 years ago

jovo commented 8 years ago

if we remove the noise from $x$ in the "uncorrelated bernoulli" and remove the noise from the "shape vs disease"

then MGC implements a "two-sample test" and "k-sample test", right?

what is an "k-sample test" called?

jovo commented 8 years ago

also, you call it "uncorrelated binomial", but isn't it "uncorrelated bernoulli"? and shouldn't we actually call it "two sample test"?

jovo commented 8 years ago

ah, these are all "tests of equality"!

but, i'm a bit confused. the "step function" is just a noisy version of this, and the power map looks completely different. it is not obvious to me why that would be the case.

nonetheless, i think i'll add a few sentences about equality tests, if you think it is ok to remove the noise from sim 19 and shape vs. disease?

cshen6 commented 8 years ago

it should be called Bernoulli, yes; but it is not wrong to call it binomial either, and the example I take from Wikipedia says it is a binomial example, which is why the naming...I think it doesn't matter either way.

But no, we are not testing equality of distributions, no. It is kind of, if we think linear dependency implies same distributions with scalar change; but again, no, we are not testing equality of distributions, not yet.

Conceptually, two-sample test does not require corresponding (x_i,y_i) pairs, but rather each marginal distribution; while the dependence test is all about testing the joint distribution of this pair.

We can talk about it tomorrow to settle the difference & spark new thoughts...

On Wed, Sep 14, 2016 at 9:28 AM, joshua vogelstein <notifications@github.com

wrote:

ah, these are all "tests of equality"!

but, i'm a bit confused. the "step function" is just a noisy version of this, and the power map looks completely different. it is not obvious to me why that would be the case.

nonetheless, i think i'll add a few sentences about equality tests, if you think it is ok to remove the noise from sim 19 and shape vs. disease?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247012301, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y_9x-JCGgfA-_JgWFXzmkJbxefcJks5qp_ZugaJpZM4J8vxR .

cshen6 commented 8 years ago

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'={X,Y} (concatenate old X and Y together), and let Y'={0,...,0,1,...1} (so samples from X are of class 0, samples from Y are of class 1).

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

cshen6 commented 8 years ago

once you are done with this issue, and feel it can be posted to arxiv, I can take the overleaf version and post it.

jovo commented 8 years ago

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,%20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247425803, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

cshen6 commented 8 years ago

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein <notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247425803, or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz-SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

cshen6 commented 8 years ago

but to say MGC is really useful for 2-sample test, we need to compare with HHG / energy distance / KS test, etc.

On Thu, Sep 15, 2016 at 8:32 PM, Cencheng Shen cshen6@jhu.edu wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com> wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,%20sa mples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247425803, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcp3lb wfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz-SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

jovo commented 8 years ago

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 notifications@github.com wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247425803, or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491384, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

jovo commented 8 years ago

don't we already compare with HHG and dcorr?

On Thu, Sep 15, 2016 at 8:33 PM, cshen6 notifications@github.com wrote:

but to say MGC is really useful for 2-sample test, we need to compare with HHG / energy distance / KS test, etc.

On Thu, Sep 15, 2016 at 8:32 PM, Cencheng Shen cshen6@jhu.edu wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com> wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,%20sa mples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247425803, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcp3lb wfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491490, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjck74q_EL1-CQ4preuIqE7i6iwoOQks5qqePUgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

cshen6 commented 8 years ago

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein <notifications@github.com

wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 notifications@github.com wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491384, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcq0EnRK_ qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491733, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66LkafBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

cshen6 commented 8 years ago

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 notifications@github.com wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491384, or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491733, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66LkafBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

jovo commented 8 years ago

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 notifications@github.com wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 notifications@github.com wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247490471 , or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491384, or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491733, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y66LkafBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247493774, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

jovo commented 8 years ago

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein jovo@jhu.edu wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 notifications@github.com wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 notifications@github.com wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 notifications@github.com wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247491733, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247493774, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

cshen6 commented 8 years ago

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein <notifications@github.com

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein jovo@jhu.edu wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 notifications@github.com wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 notifications@github.com wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com> wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247491733 , or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247493774, or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247494428, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y538jSMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

jovo commented 8 years ago

sounds great!

somehow, i think adding noise is a really great idea, even for 2-sample testing, we will ponder.

On Thu, Sep 15, 2016 at 9:29 PM, cshen6 notifications@github.com wrote:

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein < notifications@github.com

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein jovo@jhu.edu wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 notifications@github.com wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 notifications@github.com wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com> wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe- auth/ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247491733 , or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247493774, or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247494428, or mute the thread https://github.com/notifications/unsubscribe- auth/ALX0y538jSMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247498088, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcrHfAVj7uHud-zhEflEHRb62DY3wks5qqfENgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

jovo commented 8 years ago

i added a sentence to discussion.

On Thu, Sep 15, 2016 at 9:49 PM, joshua vogelstein jovo@jhu.edu wrote:

sounds great!

somehow, i think adding noise is a really great idea, even for 2-sample testing, we will ponder.

On Thu, Sep 15, 2016 at 9:29 PM, cshen6 notifications@github.com wrote:

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein < notifications@github.com

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein jovo@jhu.edu wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 notifications@github.com wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 <notifications@github.com

wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com> wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7491733 , or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247493774 , or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247494428, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y538jSMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247498088, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcrHfAVj7uHud-zhEflEHRb62DY3wks5qqfENgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

jovo commented 8 years ago

tell me why it is called "binomial" intead of "bernoulli"?

On Fri, Sep 16, 2016 at 12:11 AM, joshua vogelstein jovo@jhu.edu wrote:

i added a sentence to discussion.

On Thu, Sep 15, 2016 at 9:49 PM, joshua vogelstein jovo@jhu.edu wrote:

sounds great!

somehow, i think adding noise is a really great idea, even for 2-sample testing, we will ponder.

On Thu, Sep 15, 2016 at 9:29 PM, cshen6 notifications@github.com wrote:

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein < notifications@github.com

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein jovo@jhu.edu wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 notifications@github.com wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 < notifications@github.com> wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com> wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y8fku oaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7491733 , or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247493774 , or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247494428, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y538j SMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247498088, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcrHfAVj7uHud-zhEflEHRb62DY3wks5qqfENgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

cshen6 commented 8 years ago

Bernoulli is a special case of binomial, I guess why that is how they call it on Wikipedia, since the uncorrelated "Bernoulli" can be extended to uncorrelated binomial.

If you prefer Bernoulli, I can quickly change the tex and figure title?

On Fri, Sep 16, 2016 at 12:11 AM, joshua vogelstein < notifications@github.com> wrote:

i added a sentence to discussion.

On Thu, Sep 15, 2016 at 9:49 PM, joshua vogelstein jovo@jhu.edu wrote:

sounds great!

somehow, i think adding noise is a really great idea, even for 2-sample testing, we will ponder.

On Thu, Sep 15, 2016 at 9:29 PM, cshen6 notifications@github.com wrote:

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein < notifications@github.com

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein jovo@jhu.edu wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 notifications@github.com wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen cshen6@jhu.edu wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 < notifications@github.com

wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com> wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X%20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from%20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7491733 , or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247493774 , or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247494428, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y538jSMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247498088, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcrHfAVj7uHud- zhEflEHRb62DY3wks5qqfENgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247514044, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0yw7XmjIMoiXPVdf9sd3nMEYOqhfrks5qqhcKgaJpZM4J8vxR .

jovo commented 8 years ago

Yes please.

On Friday, September 16, 2016, cshen6 notifications@github.com wrote:

Bernoulli is a special case of binomial, I guess why that is how they call it on Wikipedia, since the uncorrelated "Bernoulli" can be extended to uncorrelated binomial.

If you prefer Bernoulli, I can quickly change the tex and figure title?

On Fri, Sep 16, 2016 at 12:11 AM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

i added a sentence to discussion.

On Thu, Sep 15, 2016 at 9:49 PM, joshua vogelstein <jovo@jhu.edu javascript:_e(%7B%7D,'cvml','jovo@jhu.edu');> wrote:

sounds great!

somehow, i think adding noise is a really great idea, even for 2-sample testing, we will ponder.

On Thu, Sep 15, 2016 at 9:29 PM, cshen6 <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein <jovo@jhu.edu javascript:_e(%7B%7D,'cvml','jovo@jhu.edu');> wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen <cshen6@jhu.edu javascript:_e(%7B%7D,'cvml','cshen6@jhu.edu');> wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X% 20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from% 20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7491733 , or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247493774 , or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247494428 , or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y538jSMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247498088, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcrHfAVj7uHud- zhEflEHRb62DY3wks5qqfENgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247514044, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0yw7XmjIMoiXPVdf9sd3nMEYOqhfrks5qqhcKgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247559382, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcnAGsbyc6VOFY53Ijk56XV2zcG3Qks5qqmOSgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

cshen6 commented 8 years ago

ok changed all binomials to Bernoulli, in tex and figure

On Fri, Sep 16, 2016 at 6:42 AM, joshua vogelstein <notifications@github.com

wrote:

Yes please.

On Friday, September 16, 2016, cshen6 notifications@github.com wrote:

Bernoulli is a special case of binomial, I guess why that is how they call it on Wikipedia, since the uncorrelated "Bernoulli" can be extended to uncorrelated binomial.

If you prefer Bernoulli, I can quickly change the tex and figure title?

On Fri, Sep 16, 2016 at 12:11 AM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

i added a sentence to discussion.

On Thu, Sep 15, 2016 at 9:49 PM, joshua vogelstein <jovo@jhu.edu javascript:_e(%7B%7D,'cvml','jovo@jhu.edu');> wrote:

sounds great!

somehow, i think adding noise is a really great idea, even for 2-sample testing, we will ponder.

On Thu, Sep 15, 2016 at 9:29 PM, cshen6 <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');>

wrote:

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein <jovo@jhu.edu javascript:_e(%7B%7D,'cvml','jovo@jhu.edu');> wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen <cshen6@jhu.edu javascript:_e(%7B%7D,'cvml','cshen6@jhu.edu');>

wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');>

wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X% 20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from% 20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/ calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7491733 , or mute the thread https://github.com/notifications/unsubscribe- auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247493774 , or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247494428 , or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y538jSMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-247498088 , or mute the thread https://github.com/notifications/unsubscribe- auth/AACjcrHfAVj7uHud- zhEflEHRb62DY3wks5qqfENgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247514044, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0yw7XmjIMoiXPVdf9sd3nMEYOqhfrks5qqhcKgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247559382, or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcnAGsbyc6VOFY53Ijk56XV2zcG3Qks5qqmOSgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247570866, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y_tUXmPGrhmhjyaj2uC4iMweom6Qks5qqnKQgaJpZM4J8vxR .

cshen6 commented 8 years ago

I also looked at your additions in discussion, and adjusted accordingly.

In particular, I deleted "whenever X or Y is categorical" for the two sample test, because I feel it could be misleading: Yes, two sample test is equivalent to the independence test by taking Y categorical and X=[X1 X2], but the two sample test itself is about testing distribution equality between X1 and X2, which do not need to be categorical.

So I think deleting this one is more appropriate.

Otherwise it is cool with me.

On Fri, Sep 16, 2016 at 7:14 AM, Cencheng Shen cshen6@jhu.edu wrote:

ok changed all binomials to Bernoulli, in tex and figure

On Fri, Sep 16, 2016 at 6:42 AM, joshua vogelstein < notifications@github.com> wrote:

Yes please.

On Friday, September 16, 2016, cshen6 notifications@github.com wrote:

Bernoulli is a special case of binomial, I guess why that is how they call it on Wikipedia, since the uncorrelated "Bernoulli" can be extended to uncorrelated binomial.

If you prefer Bernoulli, I can quickly change the tex and figure title?

On Fri, Sep 16, 2016 at 12:11 AM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

i added a sentence to discussion.

On Thu, Sep 15, 2016 at 9:49 PM, joshua vogelstein <jovo@jhu.edu javascript:_e(%7B%7D,'cvml','jovo@jhu.edu');> wrote:

sounds great!

somehow, i think adding noise is a really great idea, even for 2-sample testing, we will ponder.

On Thu, Sep 15, 2016 at 9:29 PM, cshen6 <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');>

wrote:

ok, sounds good to me!

I quickly ran the brain vs disease data, yes, there are many local p-values that are significant, while HHG and other global corr do not have significant p-values, just like the noisy version.

The only problem is that due to very limited neighbor and very limited local corr, the tailored sample MGC is not working well enough, i.e., the sample MGC has same p-value as global mcorr, which are not significant. (breaking ties offers much more local corrs for sample MGC to be more accurate, but then it is no longer the 2-sample test...)

So yes, you are right, (oracle) MGC works well & improves on the K-sample test as well! But we still need to investigate sample MGC......and put more benchmarks there.

I will operate on a separate branch of github MGC from now on, and start investigating it.

But let us not delay our draft here? Once you commented I will look at them

On Thu, Sep 15, 2016 at 8:57 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

but let's run the stuff anyway to see what happens? i'm sure we can beat them, and provide insight into the nature of the "equality", whatever that means :)

On Thu, Sep 15, 2016 at 8:54 PM, joshua vogelstein <jovo@jhu.edu javascript:_e(%7B%7D,'cvml','jovo@jhu.edu');> wrote:

ah, i see. ok, i think you are correct that it muddies the story. i'll mention in the discussion. and i'll comment so you can fix whatever i say that is wrong :)

On Thu, Sep 15, 2016 at 8:52 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

yes, but for 2-sample problem, dcov and hhg are not the golden benchmarks?

Such as energy stat (which is different from dcov), ks test (the most traditional benchmark), and maybe Heller's new JMLR binning scheme, those will make a complete set of benchmarks.

On Thu, Sep 15, 2016 at 8:48 PM, Cencheng Shen < cshen6@jhu.edu javascript:_e(%7B%7D,'cvml','cshen6@jhu.edu');>

wrote:

I can quickly run the real data one and show you.

But for both the simulation and real data, if we no longer add noise, the p-value / power map is going to have very limited y axis (2 nearest neighbor at most), which deviates /limits the structure discovery part of MGC.

I am also afraid that speaking too much on two-sample test out of the discussion, may convey a different message from our main theme. It does not add much value after all, and could be potentially confusing. (like in Heller's JMLR paper, essentially they repeat almost the same thing for K-sample test and independence test, with very small difference)

On Thu, Sep 15, 2016 at 8:35 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

i see. do you think we should re-run the "independent bernoulli" without noise? and the same for the disease one?

and then just make it explicit (without making it a big deal)?

On Thu, Sep 15, 2016 at 8:32 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

I think it directly works for the two sample test,

because the independence test coincides with the K-sample test, for Y taking K discrete values.

On Thu, Sep 15, 2016 at 8:25 PM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');

wrote:

so, just to be clear, when Y is categorical, for what we have done already, we have not actually implemented a two-sample test, just something very close to it?

is it not a two-sample test conditioned on th eprior?

On Thu, Sep 15, 2016 at 3:21 PM, cshen6 < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');>

wrote:

you are right, Joshua, MGC ESSENTIALLY IMPLEMENTS THE TWO/K-SAMPLE TEST!

Essentially, assuming y=0 or 1, the hypothesis f{x|y=0} = f{x|y=1} is equivalent to the independence test.

If y can take k discrete value, the independence test is equivalent to the hypothesis that f{x|y=0} = f{x|y=1}=...=f_{x|y=k-1}.

Generally, for any two-sample or k-sample test, say we want to test whether two data matrices X and Y have the same distribution, we can let X'= X,Y http://concatenate%20old%20X% 20and%20Y%20together, and let Y'=0,1 http://so%20samples%20from% 20X%20are%20of%20class%200,% 20samples%20from%20Y%20are%20of%20class%201 .

Then the independence test between X' and Y', is the same as the two sample test between X and Y.

So, Gabor did it the other way around (from two sample to independence), while Heller did it this way.

I am not sure we want to make it a big deal in this paper, as more careful investigations are needed for MGC performance for that, and we are concentrating on independence test here, not much on categorical data and two-sample test; plus HHG and dcorr can do two-sample testing directly too.

But yes, our method can do two-sample / K-sample testing directly, and I think it suffices to rephrase the discussion paragraph on extension to two-sample test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7425803 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcp3lbwfpc74X5YIrfnCiSNS0W_UPks5qqZq6gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/c alendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7490471, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y8fkuoaz- SUxJsefrgqJyasv33Lsks5qqeHkgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment- 247491384 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcq0EnRK_qWDVKwHKKO1tqmbhjQGOks5qqeOYgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/c alendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/ MGC/issues/120#issuecomment-24 7491733 , or mute the thread https://github.com/notificati ons/unsubscribe-auth/ALX0y66Lk afBMV68ZMY9dY4MNhv3IFBWks5qqeREgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment- 247493774 , or mute the thread https://github.com/notifications/unsubscribe- auth/AACjclfmvMpU4yt9DW-C_ihgolJmAvdHks5qqeg3gaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <https://github.com/neurodata/MGC/issues/120#issuecomment-24 7494428 , or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0y538jSMzHU1dm571Y_amjHgMZBplks5qqemPgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-24 7498088, or mute the thread https://github.com/notifications/unsubscribe-auth/ AACjcrHfAVj7uHud- zhEflEHRb62DY3wks5qqfENgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav% 40gmail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247514044, or mute the thread https://github.com/notifications/unsubscribe-auth/ ALX0yw7XmjIMoiXPVdf9sd3nMEYOqhfrks5qqhcKgaJpZM4J8vxR .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247559382, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcnAGs byc6VOFY53Ijk56XV2zcG3Qks5qqmOSgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gm ail.com&ctz=America/New_York

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247570866, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y_tUXmPGrhmhjyaj2uC4iMweom6Qks5qqnKQgaJpZM4J8vxR .

jovo commented 8 years ago

ok, i'm satisfied with the existing edits. so i think it is ready to be arxived. i am speaking again with brett on sunday. do you want to start the arxiv process, and then we can update it on sunday after i make some more text edits?

jovo commented 8 years ago

note: you can directly post to arvix from overleaf, i think it is merely a 1 click thing. i'm gonna try it. let me know if you want me to untry :)

cshen6 commented 8 years ago

ok, great!

On Fri, Sep 16, 2016 at 11:19 AM, joshua vogelstein < notifications@github.com> wrote:

Closed #120 https://github.com/neurodata/MGC/issues/120.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#event-792263950, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y5Kt3inLctJW_p6W8rRCT2JBOl4zks5qqrNkgaJpZM4J8vxR .

cshen6 commented 8 years ago

I went on and removed all unnecessary comments /packages/ environment on the overleaf version.

In case you want to revert, I also labeled the old version with comments for easy roll back.

I noticed there are a few \jv ones commented out there, I kept them in case you are still thinking about them.

On Fri, Sep 16, 2016 at 11:38 AM, Cencheng Shen cshen6@jhu.edu wrote:

ok, great!

On Fri, Sep 16, 2016 at 11:19 AM, joshua vogelstein < notifications@github.com> wrote:

Closed #120 https://github.com/neurodata/MGC/issues/120.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#event-792263950, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y5Kt3inLctJW_p6W8rRCT2JBOl4zks5qqrNkgaJpZM4J8vxR .

cshen6 commented 8 years ago

and I will leave it for you to try posting, or I can try it if it doesn't work out :-)

On Fri, Sep 16, 2016 at 12:02 PM, Cencheng Shen cshen6@jhu.edu wrote:

I went on and removed all unnecessary comments /packages/ environment on the overleaf version.

In case you want to revert, I also labeled the old version with comments for easy roll back.

I noticed there are a few \jv ones commented out there, I kept them in case you are still thinking about them.

On Fri, Sep 16, 2016 at 11:38 AM, Cencheng Shen cshen6@jhu.edu wrote:

ok, great!

On Fri, Sep 16, 2016 at 11:19 AM, joshua vogelstein < notifications@github.com> wrote:

Closed #120 https://github.com/neurodata/MGC/issues/120.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#event-792263950, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y5Kt3inLctJW_p6W8rRCT2JBOl4zks5qqrNkgaJpZM4J8vxR .

jovo commented 8 years ago

you post!

On Friday, September 16, 2016, cshen6 notifications@github.com wrote:

and I will leave it for you to try posting, or I can try it if it doesn't work out :-)

On Fri, Sep 16, 2016 at 12:02 PM, Cencheng Shen <cshen6@jhu.edu javascript:_e(%7B%7D,'cvml','cshen6@jhu.edu');> wrote:

I went on and removed all unnecessary comments /packages/ environment on the overleaf version.

In case you want to revert, I also labeled the old version with comments for easy roll back.

I noticed there are a few \jv ones commented out there, I kept them in case you are still thinking about them.

On Fri, Sep 16, 2016 at 11:38 AM, Cencheng Shen <cshen6@jhu.edu javascript:_e(%7B%7D,'cvml','cshen6@jhu.edu');> wrote:

ok, great!

On Fri, Sep 16, 2016 at 11:19 AM, joshua vogelstein < notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Closed #120 https://github.com/neurodata/MGC/issues/120.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#event-792263950, or mute the thread https://github.com/notifications/unsubscribe-auth/ALX0y5Kt3inLctJW_ p6W8rRCT2JBOl4zks5qqrNkgaJpZM4J8vxR .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/neurodata/MGC/issues/120#issuecomment-247639713, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcjoQp9J6ZmT_KL427XOMFOfl7dZ9ks5qqr3VgaJpZM4J8vxR .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York