thoughts on current draft

jovo commented 8 years ago

on the real data, we do a bunch of great analysis, but we skipped that on the theory/simulated data. here are some related specific comments:

[x] in Fig 1, why not also add the 2 estimates (say for M=3) below, to make the point super clear early on?
[x] in Fig 2, we only show scale relative efficiency, which does not provide a sense of what the overall error is. perhaps another panel(s) showing the absolute error?
[x] in Fig 2 & 3, clarify more explicitly which lines are estimated from data, and which are theoretical.
[x] i'm a bit confused about the ordering of the figs/results. we first talk about theory, then fig 2 is about simulations, then fig 3 is about theory again? let's consolidate theory, and then simulations, and then real data?
[x] Fig 3 shows theoretical results, do we have numerical results that demonstrate that we are doing well in those settings?
[x] Fig 5 & 6 are great and super illustrative! why not also show them much earlier (like after fig 1) for the SBM? same question for fig 4 (choosing one "atlas = # of blocks").
[x] in general, i think the text could do a better job explaining why the numerical simulations are important. it is not only to demonstrate that our estimators work. we also get a lot more information, say, about finite sample efficiency. in other words, numerical experiments do not merely "illustrate" theoretical results, they do much more.
[x] in the text, we kind of pretend that ASE always works better on the real data. in truth, it only works better when sample size is extremely low. we should address this in 2 ways. first, in the discussion, we can talk about typical sample sizes, and the desire to stratify clinical populations on the basis of various phenomological covariates, like age, gender, and even family history.
second, even though our estimators don't have better MSE, they are better in other ways. in particular, the low-rank approximations yield better interpretability, enable us to more easily identity vertices that are interesting, etc. we probably want a sentence about each of those in the results, and then a few more in the discussion.

dpmcsuss commented 8 years ago

I'm going to check off stuff as I get to it.

dpmcsuss commented 8 years ago

For 1st point see 145ed50.

dpmcsuss commented 8 years ago

I'm going to think about the second point. I don't feel the absolute errors will be especially meaningful since those will change for $N$, $M$, and $B$. I think we really need to point out that since we're scaling a flat line is really good for $\hat{P}$. If you have more specific ideas let me know.

dpmcsuss commented 8 years ago

I think point 3 will be solved by point 4.

dpmcsuss commented 8 years ago

For point 5. We say that we simulated the results and they do comparably at N=500 and M=100. We decided earrlier not to include the numerics since they basically overlapped with the theory.

jovo commented 8 years ago

these changes are great. here are a few more minor ones:

note that i'm doing it this way so that you can see the thought process. i would be easier for me to actually make the change, but i believe this is more helpful. open to other suggestiosn.

[x] our alg, after estimating P, does not "fix" the diagonal to remove entries if the graph does not have loops. this seems weird to me, it shouldn't change the asymptotics, and it makes the figures (eg, fig 1) look funny.
[x] let's move S2.4 (our estimator) into results section. right now, it is in the middle of a big section on background, so it is very unclear to the reader that it is our new thing.
[x] let's add a very small subsection on relative efficiency to methods. like 2-3 sentences, citing appropriate background, justifying its use (it is an unfamiliar concept to most of the readers). also, i don't think our definition is correct. RE is the expectation of MSE, not the MSE itself. when MSE is unbiased, they are the same asymptotically, right? i would also always clarify either "asymptotic relative efficiency (ARE) or RE_n, to highlight that we do some asymptotics theoretically, and some finite sample stuff numerically. also, provide some guidance. large RE means what? RE \approx 1 means what? RE close to zero means what?
[x] you can't "validate" with simulations, you can support, and more importantly, you can get more knowledge, eg, learn the actual constants, etc. let's change the subsection title to reflect that? in particular, the theory is asymptotitc theory, the simulations are finite sample simulations.
[x] the first sentence of Fig 3 caption is misleading, it only talks about monte carlo. but fig 3 is about showing the finite sample errors and their convergence to the theoretical bounds.
[x] i added a sentence to the abstract, it is gone. perhaps because i added it to a different file since we now have "Draft/Main.tex", "plos-latex/plos-submission.tex", and overleaf. this is suboptimal form. let's have 1. overleaf supports git, so you can have a folder within our repo for the overleaf master, and archive/delete the other folders? it was

Often, the sample or cohort size is relatively small, whereas the number of potential edges is much larger.

[x] "nowawads" is too informal.

[x] naive --> na\"ive

[x] "some bias with greatly" --> "some bias BUT greatly"

[x] add citation for BV trade-off, eg, trunk

[x] after stein, mention the explicit result, it is amazing and will get them thinking

[x] "doesn't close the door" is too informal

[x] "weigted adjacency matrix with weights given by the proportion of times the 26 corresponding edge appears in the population." i wouldn't say that. we define the mean graph as the Expectation of A with respect to the distribution of A. right? it holds for any graph distribution, it is just the first moment of the distribution.

[x] "Intuitively, an estimator incorporating the mean-graph 39 structure is preferable to the entry-wise MLE." sentence is weird to me. what is "mean graph structure"? i tend to think of "mean" as an estimator, and "expectation" as the population statistic/property of the distribution. but basically, i'd say an estimator incorporating properties of the distribution is preferable assuming it is computationally tractable.

[x] "Using the estimates of the latent positions based on a truncated eigen-decomposition 52 of the adjacency matrix, in the RDPG setting we consider an estimator for the mean of 53 the collection of graphs which captures the low-rank structure of the RDPG model. 54" run-on sentence?

[x] "real data analysis that it frequently outperforms the element-wise MLE" try to avoid the word "it" whenever possible, and use the name of our estimator (did we name it, we need to name it)?

[x] "small sample size" we use that term often. however, without knowing the number of vertices, it is always a relative term. i think we should clarify (small sample size for a given graph size), or something like that.

[x] "Each vertex represent " missing 's'

[x] "Each vertex represent a well defined anatomical region present in each subject, and an 68 edge between two regions is defined to exist if correlation in activity between the regions 69 surpasses a certain threshold. Similarly, for structural brain imaging an edge may 70 represent the presence of anatomical connections between the two regions." is weird. let's be super clear and accurate. we don't consider any correlation data, so that comes out of nowhere. if we want to say that, we should put it in context. same with the structural data, mention fMRI or diffusion MRI. i can fix it up after you take a crack at it.

[x] "We consider three nested models" SBM & RDPG are not nested. they overlap. positive definite SBM is a special case of RDPG. let's be correct, we are setting an example for them.

[x] "mean graph is the For this case, " somehting screwed up

[x] "For this case, we aim to estimate the mean matrix P = E[A(m)] 80 base on the observed adjacency matrices A(1),...,A(M)." why is there an "^(m)" in the Expectation? that does not seem right. i don't really understand what this sentence is trying to do though?

[x] "njoys the 86 many asymptotic properties of the MLE as M → ∞. " for fixed "n", right? should we say that?

[x] S2.2 says that we don't exploit graph structure. true, but we haven't introduced the possibiliyt of graph strucutre. i would re-order: IEM, SBM, RDPG, and then maybe LPG. in methods, always go from conceptually most simple to more complicated. having taught this to neuroscientists many times, i can assure you the order is IEM, SBM, RDPG, and then LPG (many don't know dot products, and certainly haven't been introduced to kernels, etc.). then, i would introduce our estimators. note that when we move SBM, the text will change to elaborate. eg, discuss SBM in the context of a mixture of ER graphs, provide some intuition is to why this is the simplest possible generalization of ER graphs, and the SBM as a RDPG goes after the RDPG section.

[x] "Additionally, there are no useful 91 asymptotic properties for A ̄ as the number of vertices N becomes large." i don't think this is true. i bet as long as N/M --> 0, we still have a bunch of useful asyptotics.

[x] in general, i provide intuition to this community before equations, eg, line 112.

[x] alg 1:

what is "kmax"

line 4 is confusing. we dont' use Abar+D, we select the dimension of it using something else. specify how?

similar issue for line 5: specify how

output is typically right beneath input, neither has line number

[x] in lemma 3.1, i don't understand why we have 2 claims. it seems like the latter covers the former?

[x] theorem 3.1 refers to lemma A.3, which has not yet been referenced.

[ ] "Also, the ARE does not 204 depend on the number of graphs M, so the larger the graphs are, the better Pˆ is 205 relative to A ̄, regardless of M. " this was a surpising result, right? not in retrospect perhaps, but we didn't realize that would be true?! we should highlight that! in general, we should be more clear about our contributions, we defined a new estimator, we proved RE results, in particular, that do not depend on M!

there are more minor notes coming.

jovo commented 8 years ago

[x] i don't understand the sentence "To illustrate Theorem 3.2, Fig 2 shows 1/ρs + 1/ρt, the scaled asymptotic RE, for 207 each s,t ∈ {1,2} as ρ1 changes from 0.01 to 0.50 for a two block model." if we have a 2-block model, why don't we just name the parameters rho_1 & rho_2? what does adding rho_s & rho_t give us? it seems unnecessary notaiton.
[x] rho is only introduced in the theory section, but it is a parameter of the (unconditional) SBM, so let's introduce it there?
[x] scaled RE is an important concept. it should be introduced in methods, and explained why we use it instead of absolute RE. it should also be very carefully defined (it does not seem to be now).
[x] in fig 2, should we draw a line at scaled RE = 4?
[x] fig 1 is now great, the fact that our estimator is WAY better than MSE pops out. that is what we want each figure to do. once we (internally) know the main point of the figure, we draw it so that it pops out. in fig 2, what is the single main take-away? it should be immediately obvious. i'm not sure what it is even after reading the caption. in other words: are we always winning? if not, when are we winning, when losing, when tie?
[x] "To illustrate Theorem 3.2, Fig 2 shows 1/ρs + 1/ρt, the scaled asymptotic RE, for 207 each s,t ∈ {1,2} as ρ1 changes from 0.01 to 0.50 for a two block model. " the parameters of the block model have not yet been provided, they should be. it should be stated that they don't matter, so the figure would look the same for any parameters (is that true?).
[ ] "While there is no compact analytical formula for the relative 214 efficiency of Pˆ versus A ̄ in the general RDPG case, using the same ideas as in 215 Theorem 3.2, we can show that RE(A ̄ij , Pˆij ) = O(1/N )." is that a lemma? a conjecture? a theorem? clarify.
[x] "The reason for this is because for larger M, more of the eigenvectors of A ̄ will 221 begin to concentrate around the eigenvectors of the mean graph. " i guess if we define "mean graph" more formally up there to mean E_f[A], then i'm ok with this.
[x] " optimal embedding dimension will increase," optimal with regard to what? estimating the mean? clarify.
[x] "As a result, RE(A ̄,Pˆ) will 224 increase as M increases for full-rank models." well, it will converge to 1, right? we should clarify.
[x] "where MSE denotes the estimated mean square error based on the 239 Monte Carlo replicates." isn't MSE always an estimate for us? we use it to calculate error between sample and population statistics? why does it only get a hat here?
[x] "While we cannot interpret this mean graph as the probability matrix for an IEM 275 distribution (see section 3.4, it matches our definition as the proportion of times each 276 edge appears in the population." something wrong
[x] "y impacs Pˆ, 281" t missing
[x] "With a larger M, the performance of A ̄ improves so 294 that its performance is frequently superior to Pˆ but Pˆ still performs nearly as well." i would say they perform nearly identically, Abar is never better in a meaningful way, is it? "frequently" with regard to what? different choices of dhat? why does that even matter? or different samples?
[x] "lareg"
[x] "Ontheotherhandoncem=10is,A ̄ " m=10 is?? some typo here
[ ] table 1 is what? average RE? where are errorbars? are they negligle?? say that or show them. are any of the differences significant? i wouldn't say anything is better or worse if it is in the noise. if we are testing, what test are we using? nothing is independent.
[x] " The graphs are based on diffusion tensor MR images 259 collected and available at the Consortium for Reliability and Reproducibility (CoRR) 260 (Gorgolewski et al., 2015; Zuo et al., 2014)." the CoRR dataset has many more graphs than that. ask @gkiar which graphs he gave you.
[x] "Fig X" should always be "Figure 6". the journal will change it to whatever they want.
[ ] Fig 5 should have a scalebar
[x] " We also highlight the regions corresponding to vertices that 325 contribute most to the difference, meaning the vertices i with the largest value of 326 􏰇j (|A ̄ − P | − |Pˆ − P |)ij . " how many? we list the top 5. do we show the top 5? clarify in text and caption please.
[x] "as long as provided the dimension is not underestimated." typo
[x] "These results demonstrate that overall Pˆ gives a better estimate than A ̄ for the 332 CoRR dataset with all three atlase" too strong. be more accurate and true.
[x] S3.4 is a "synthetic data analysis". let's use that term explicitly.
[x] "We also see that the degradation in the performance of 354 Pˆ in real data can at least partially be due to the fact that the independent edge 355 assumption does not hold for real data." i don't see that. how do you see that? please clarify.
[x] "demonstrating that when N 361 is sufficiently low-rank methods provide a substantial improvement. " typo. also, always name the parameter (sample size) rather than just notation. biologists don't think in our notation.
[x] "(see Section 2.5, on" missing ")"?
[x] "methods are still necessary." "necessary" is an evil word, rarely true. let's be more honest. useful perhaps, but useful for what? in what contexts?
[ ] "Rank-based methods and robust likelihood methods could be very useful in that case. " provide a citation.
[ ] intuitively, as N increases, with fixed K and rho and M, shouldn't Phat do much better than Abar? shouldn't we discuss this case in the discussion?
[ ]

dpmcsuss commented 8 years ago

Relative efficiency (for square error loss) is the ratio of MSEs. MSEs themselves are of course expectations. https://en.wikipedia.org/wiki/Efficiency_(statistics)

jovo commented 8 years ago

oh right, mean square error. my bad.

On Mon, Aug 8, 2016 at 9:35 AM, Daniel Sussman notifications@github.com wrote:

Relative efficiency (for square error loss) is the ratio of MSEs. MSEs themselves are of course expectations. https://en.wikipedia.org/wiki/ Efficiency_(statistics)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/1#issuecomment-238238175, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcitrv9FoSnZpZWVPH8pSVOjC9-qOks5qdzC4gaJpZM4JcrMj .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

dpmcsuss commented 8 years ago

I think the naive vs na\"ive doesn't matter. Both are acceptable, with naive being more common. http://www.merriam-webster.com/dictionary/naive

dpmcsuss commented 8 years ago

I didn't change the order of IEM, RDPG, SBM but I did provide more intuition for the RDPG.

dpmcsuss commented 8 years ago

I guessed at what this was reffering to:

in general, i provide intuition to this community before equations, eg, line 112.

And tried (poorly ;-)) to implement this in general

dpmcsuss commented 8 years ago

Plos wants Fig. X

dpmcsuss commented 8 years ago

@TangRunze @jovo

"Rank-based methods and robust likelihood methods could be very useful in that case. " provide a citation.

I think we want to cite Runze's in prep paper?

jovo commented 8 years ago

i would have cited Huber and Lq likelihood papers that exist.

On Wed, Aug 10, 2016 at 1:42 PM, Daniel Sussman notifications@github.com wrote:

@TangRunze https://github.com/TangRunze @jovo https://github.com/jovo

"Rank-based methods and robust likelihood methods could be very useful in that case. " provide a citation.

I think we want to cite Runze's in prep paper?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/1#issuecomment-238944928, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcuFodAYun0ennFjMQffaWo3oUEPoks5qeg2HgaJpZM4JcrMj .

the glass is all full: half water, half air. neurodata.io, jovo calendar https://calendar.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York

dpmcsuss commented 8 years ago

Ahh, ok cool that's easy. I was thinking too specifically.

jhu-graphstat / LLG

thoughts on current draft #1