jhu-graphstat / LLG

0 stars 2 forks source link

Referee Reports #12

Open dpmcsuss opened 7 years ago

dpmcsuss commented 7 years ago

Reviewer 1

For someone with statistical training such as me, reading Tang et al's manuscript was a pleasure. In essence, the work tell us that a low-rank approximation of the connectomes is a good one and that using it in a Jame-Stein-style biased estimator gives a good small-sample estimate of the mean of connectomes. With that said, I fear that in the way the manuscript is currently formulated, the results will be lost to a sizable fraction of the NeuroImage readership. I give below specific comments that can help make the manuscript more relevant to the neuroimaging community.

To start with, I think that intuitions should be put forward more quickly in the manuscript. I was convinced of the validity of the model for connectomes only when I saw figure 5, which comes very late in the manuscript. It would be useful to show connectomes estimated on a few different atlases as well as their low-rank approximation early in the paper; probably before going into the formal models of section 2. Intuitions are important in a multi-disciplinary audience such as that of neuroimage. It is not a trivial insight that brain graphs, including anatomical connectivity graphs, are well approximated by low-rank models. It gives the full meaning of your work. To be fair, reading the last sentence of the abstract "low-rank methods should be a key part of the tool box for researchers studying populations of graphs", I thought that it was an overstatement and that it was true only for a small set of applications.

Intuition and figure 5 should be moved/copied to the front. Intuition for the RDPG/SBM model for Brains is ... For SBM, intuition is that regions are grouped into blocks where vertices in the same block have similar connectivity structure. At the highest level the brain is a two-block model ... see figure 5. For RDPG, slightly more complicated but generalizes the SBM allowing for mixed membership (in neurosciency words) and degree corrections. Explain figure 5

Along a same line, the manuscript is written in very general terms, and connectomics in neuroimaging sometimes appears as a side aspect of the work. Anchoring the vocabulary and the examples in neuroimaging would help making it more relevant for NeuroImage. For instance, the last sentence of the abstract is a statement on graphs in general; and the introduction starts with and mostly discusses graphs and statistics, rather than the brain. Section 2, on models, starts in a very formal way, and then considers connectomics as "an example application"; it is unclear to the reader why, at this point, the models are relevant to stated goal of defining means in connectomics. When submitted to NeuroImage, the discussion should be framed in the context of brain.

The above paragraph can only be accomplished by @jovo 👍 , though we're happy to start out with some changes.

A lot of these changes will be in the intro.

The word "inadmissible" is used in its statistical sense without being defined. It is a common-English word, and hence its meaning here should really be made explicit.

@TangRunze will define:

The title is not very related to the work presented. I would strongly argue that it should be changed to something more descriptive about the work.

I propose "Connectome Smoothing and a Law of Large Graphs"

One a big picture standpoint, the tools presented in this manuscript are useful only if there are very few subjects (between 5 and 10). When there are more, they can be detrimental (as seen from figure 4, where for JHU and Desikan atlases they do not improve upon the naive estimator. This caveat should really be discussed: what application problem do they solve? 10 subjects is below the typical study. Right now this aspect feels hidden under the rug in claims of general usefulness.

I think this is quite fair and perhaps

  1. Show that at M=50, the "low-rank" are still not that bad.
  2. Emphasize that this can be applied to subgroups, such as all Females, between the age of 21 and 25, to better explore differences between groups.
  3. Using other inference tasks. @TangRunze mentioned the fish data. we could also do testing for subgroups for the 454 graphs data. @jovo, what kind of covariates do we have for the 454 graph data.

The asymptotics presented in section 4.1 are for number of nodes going to infinity with a fixed number of subjects. This are very much low-sample asymptotics. While I understand their interest for proofs, it is not clear to me that they relate well to application settings. I believe that this aspect should be discussed. In particular given that the good performance on the low-rank model is somewhat created by this specific choice of asymptotic regime.

Say something like: We anticipate the collection of larger and larger brain network which will also likely initially correspond to smaller sample sizes as the technology to scale these connectome collection techniques is developed.

In section 4.2, if I understand things right, the simulations are done informing the algorithm on the actual rank of the adjacency matrix. This does not reflect well the application situation. I think that simulations that include the automatic choice of the dimensionality should be included. In general, I find that the simulation settings are overly favorable to your model: it is a tautology that the estimator proposed will work well on the simulation, as the matrix is actually low rank. I would suggest that the authors should also do simulations on data that slightly break the model, adding non low-rank aspect to the graph, to probe when the estimator breaks down.

I think we just need to better explain that 4.2 is really a toy model setting to illustrate the theory and how it works even in finite sample, for an idealized setting.

4.4 is our more realistic simulation setting.

Remark 4.4 is an important one: it tells us that if the graph isn't well approximated as a low-rank matrix, the estimator will not perform well. I believe that it should be discussed in the discussion part, and maybe put in perspective with the fact that low-rank approximations seem to work well with brain connectivity graphs.

Yes! Worst case is when all eigenvaleus are almost the same. this leads to a certain type of structure, which looks like somehwat like the identity matrix, howeever we don't see that structure in connectomics. Lacks of gaps ---> block diagonal structure. (That's a little fuzzy but I think we can argue that)

Scree plot for the mean graph based on all 454 to show that there is a good low rank approximation.

Some important information is given in section 6, which is the last section (maybe it is meant to be an appendix?). Reading the manuscript was awkward as it wasn't clear to me that some information was given later. For instance, around line 430, the descriptions of the methods to select the rank are named, but no reference is given to the part in section 6 that describes them. Similarly, paragraph 6.3 should be clearly referenced early in the text.

Add in the references. At the beginning of relevant sections, mention that certain augmentations of the procedure are desceribed in Section 6 in more detail.

The first column on figure 4 is a bit surprising: computing a mean with a single subject. It would be useful to either discuss why it is relevant, or to remove it.

Say something useful ?

Dict-learning, ICA, network of network (Abraham, biobank, calhoun)

Maybe find these papers and cite them?

Mention these techniques as alternatives ...

Line 18: I think that the word "sample" should be plural.

As a nitpick on wording on line 26, I would rather say "a bias-variance trade-off", rather than "the bias-variance trade-off". To me their are many different tradeoffs, as there are many different kind of biases. The bias that the authors are taking here is a useful one, and grounds the success of the work.

I was surprised on line 177 to read about positive semidefinite matrices while line 102 mentions that the diagonal of the matrices are zero. To me they cannot be positive semidefinite without be zero (a simple proof would be that the trace is invariant to the basis; considering the basis in which the matrix is diagonal, the trace is the sum of the eigenvalue. If the diagonal is zero, this sum is hence zero. As or the eigenvalues are positive, they should all be zeros). I suspect that this problem is related to the comment on line 240 and to section 6.2. However, I must say that I found that it didn't make reading the manuscript easy.

@TangRunze will explain the eigen vs svd story to CEP.

On figure 4, the legend says literally "A bar" and "P hat". I must confess that reading this quickly, I did not immediately make the connection with the symbols.

Figure 5 should probably also show the subtraction of A bar and P hat, as the difference is very hard to see.

dpmcsuss commented 7 years ago

Reviewer 2

In this paper, Tang and colleagues proposed a low-rank method to estimate the mean of a collection of undirected and unweighted graphs. Under a semi-positive definite stochastic block model, or more generally a random dot product graph model, the authors derived analytic results showing that the proposed method is guaranteed to have better performance than a simple element-wise average of the observed adjacency matrices when the network is large and sample size is small. Using simulations, they confirmed that this is also true when the data is generated from an underlying independent edge model.

Overall this paper is clearly written, easy to follow and the proposed method looks sound to my assessment. However, I have two major concerns about this paper.

(1) I think this paper doesn't fit very well into the scope of NeuroImage. The work is completely methodologically oriented, has a strong statistical taste and was motivated by and grounded on the general graph theory. The introduction did not refer to any neuroimaging studies. Although the paper included a real DTI data set, and it was only used to synthesize data and validate the proposed method. To publish this paper in NeuroImage, I think the authors need to build a stronger connection between the method and the current neuroimaging literature, demonstrate its potential usefulness in addressing important scientific questions in brain network studies and provide more interesting real data applications. A statistical journal might be a more appropriate target for the current form of this paper.

(2) It seems from figures 4 and 8 that the proposed method outperforms the element-wise average only when the sample size is very small (N<10). Beyond this sample size, the two methods produce indistinguishable results or the simple average approach can be better. I think this considerably weakens the usefulness of the method in real data analysis. Currently in the neuroimaging world, most population based brain network studies have a sample size much larger than N=10. The proposed method could still be useful when the mean graph of a small subgroup of the subjects needs to be estimated but this paper did not provide an example. In the Discussion section, it was mentioned that the low rank representation can be used to improve interpretability of the data and provide biological insight. Unfortunately, this wasn't demonstrated in a real data set either.

Some minor comments:

— perhaps use a more informative title? — line 223: what do \tilde{U} and \tilde{S} and represent? — line 224: largest eigenvalues of A —> largest eigenvalues of \bar{A} — line 300: \rho_2 = 1-\rho_2 —> \rho_2 = 1-\rho_1 — line 304: the relative efficiency should be N/(N-1)?

dpmcsuss commented 7 years ago

Reviewer 3

Summary: The Stochastic Blockmodels (SBM) and the Random Dot Product Graph (RDPG) represent two prominent classes of models that frequently appear in network clustering and dimension reduction problems. Both models assume that the observed edges are generated according to some unknown random graph distribution whose first moment (the population Mean Graph) is structurally well approximated by the cluster structure found by the SBM and RDPG. Previous work on the RDPG models suggested a new class of estimators of population Mean Graph that are based on low-rank approximation methods. This paper considers the same class of estimators but in the context of an SBM which is assumed to have a positive semidefinite probability connectivity matrix so that it could be expressed as the RDPG model. This low-rank estimator is contrasted against the classical maximum likelihood estimator (MLE) for the SBM showing some improvements over the classical MLE particularly for large networks.

This paper touches on an important topic and certainly offers some interesting new perspectives which can be useful to the neuroimaging community. However, I fear that the paper needs to go through substantial modifications (that are in reality beyond major revision) in order to merit publication in NeuroImage. Nevertheless, I will ask for major revision but I expect a lot of effort and massive improvement in the second round.

1 Writing skills

The presentation and writing skills need to be radically improved. The authors should carefully proof-read the material before submission and have a sanity check if the current manuscript conforms to the journals' standard. Several things come to mind as potential remedies.

Consider your audience. Publications from this journal are read by researchers with diverse backgrounds including machine-learning, neuroscience, psychology, psychiatry, statistics, engineering and physics. Thus, in order to maximise your impact you really need to write things in a simple manner whenever possible. In particular, your introduction should read in a such way that is understandable to researchers that have never heard of the Stochastic Blockmodel (SBM) or Random Dot Product Graph model. Hence, start from defining the problem in general terms and mention methods known to your audience. See, for example, the paper by Rubinov and Sporns [1] to give you a flavor of what this audience is aware of and then try to put your work in this context. In a nutshell, your introduction should clearly state:

  • what is the problem (e.g., network clustering) and some examples of data and methods
  • define the models that you will consider (there are so many papers on SBM it is a common courtesy to cite them) and comment their use in neuroimaging data [2, 3, 4, 5, 6, 7, 8, 9]. Please also note that "community structure" is a term that these days is identified as "modular organisation" and as the SBM can identify more diverse structures than that, I would suggest that you simply refer to at as "cluster structure" and thus define this in a more general way.
  • what is that you are proposing
  • how different/novel this work is to the work of previous authors, including [10] and [11]
  • why your work is important to this audience and what they can hope to gain. For example, if you are claiming that your estimator has an advantage in large networks with a small number of realisations, then it would be great if you could give a practical example of such data. Another example, if somebody is interested in logistic regression model and hypothesis testing how your estimator changes the game there?

Satisfy the readers' expectation. As a reader I was getting really irritated with some sections because the ground would be prepared for one point and then something completely different would be stated next. For example, just before your Algorithm 1 you are talking about A ̄ and A(m), so I am expecting something that is involving these objects and then I learn that in fact you define some general A that is a symmetric real matrix that has never been mentioned before as the main input of your algorithm. Thus, if you are changing gears you need to reflect that in your text beforehand. Your figure captions need to clearly define all the components of your plot before you proceed to comment on the most striking results. For example, in Figure 2, B11 and B12 are not clearly defined (also at best your notation was B_{11}) and thus if you set specific values beforehand for connection probabilities then you should tell us about it. The same point can be made for Figure 1, where you do not even give colour bars with probability values. Overall, if you have more than one plot in your figure it is expected that you label subplots with A,B, etc., and state what each plot represents (this is much easier than bottom left or top left approach). Your individual plots should be of the same size and also combined in such a way that a figure is of reasonable size and viewing quality. Axis titles should start with capital letter and the same applies for plot subtitles (e.g., Fig. 1 "rank-5 approximation" − > "Rank-5 approximation")

Typos/Grammar. You need to make sure that your text is free from typos. For example, "stochastic block model" appears in the title of section 2.3 while everywhere else you use the "stochastic blockmodel". When you define dot product I really do not see the reason for the use of < · > notation, but if you are really bent on this notation at least define it in a correct way < x, y > should not be a sum over i. In the spirit of the readers' expectation, if you are already using < xi,xj > then it would be easier to define dot product in terms of these variables than to introduce y which is redundant. When you state $B{\tau{i},\tau{j}}=\nu{\tau{i}}^{\top}\nu{\tau{j}}^{\phantom{\top}}$ you are giving an impression that $\nu{\tau{i}}$ is a row vector? (Also, you may want to investigate \phantom command as a quick way to align indices when you are using \top for transpose, so \phantom{\top} can help you there). In Section 2.2, "... the entries A{ij} are distributed independently as Bern(< X_i,Xj >) for ..." you really need to say Bernoulli trials rather than to use abbreviation if you want to be consistent with Section 4.2 "...$\tau{i}$ are drawn iid from a multinomial distribution ...". Note that this list of examples is not exhaustive, there are plenty of other instances when things are not defined but are used or it is assumed that the reader knows your notation very well.

Citation. Please pay attention how you are citing. "Hoff et. al. (2002)" is incorrect. Also when you are citing two or more papers in parenthesis you should really separate them by semicolon rather than comma. Finally, you need to be very careful when you are describing your own original work and when you are using results of other people (i.e. proposing to use is very different to we propose).

Notation. Please do not abuse statistical notation! Capital Roman letters indicate random quantities and small Roman letters indicate their realisations. Saying stuff like P(A{ij} = a{ij}) makes sense. However, this P(A_{ij}) means nothing. It is important early on to define basic statistical notation. Note that you can use bold face for non-scalar quantity. I cannot stress enough how important it is to distinguish between these quantities especially when you are taking expectations. I should be comfortable when I read your text and not wrack my brains out trying to see if an object is random or non-random. Finally, it is important to establish a unifying notation that is your own. So far, my overall impression is that your notation is way too eclectic.

Objectiveness. Please do not misuse adjectives when you are comparing things. For example, you say "... in this instance visual inspection demonstrate that $\hat{P}$ performs significantly better than $\hat{A}$". This suggests that you are using a statistical hypothesis test of some sort and that you have, for example, p-values that can back this up. This is not a correct expression for what are you trying to say. Please try to be more realistic/careful with this.

Structure. Please also consider re-structuring your work. When I am reading about the SBM I should get all the information I need to follow the later developments. So, I believe that in asymptotic results I should already be familiar with the distribution of latent variables and that distribution of edges is conditional on the cluster labels. This should be defined in the section on the SBM. Again, there are other places where you could have structured things in a more reader-friendly way. For example, look at some NeuroImage papers and see how they structure the simulation methods and results. You can do something like that too. Also, NeuroImage allows only sections and subsections, so I really think that you should use these wisely.

2 Technical

My general impression of this work is that it is very sloppy. There is some interesting material, but it will take serious efforts and work to get this into a good shape. I personally expect every NeuroImage paper to be self-contained. If you are using some method then this needs to be defined. Although I do not expect this to appear in the main text, all background material should be easily accessible and contained in either the appendix or in the supplementary material of this manuscript. This should be obviously summarised in your own words and with your own notation.

2.1 Asymptotic Theory

Define MSE before you define Relative Efficiency (RE) "Somewhat surprisingly, the asymptotic relative efficiency will not depend on this fixed sample size M" You are jumping ahead of yourself. I am expecting from you to interpret RE here and not state your main result. Please, explain what values RE can take and what we learn from that. A multinomial distribution takes the total number of trials as a parameter and this needs to be stated. I think it is better for you to state here a "Categorical distribution". Block membership probabilities are defined as a vector of length K, so I am confused what is then $\rho{\tau{i}}$ ? Is this then a vector of length N? Note that in the appendix you state $\sum{i}\rho{i}=1$. Please explain |{i : i = k}| that it means the total number of nodes in block k. Explain why this estimate converges to k as N increases. Are you using Maximum Likelihood theory here? Your comment that the distribution of edges is conditional on the latent variables should have been already discussed when you first time talk about the SBM. For Lemma 4.1 in Section 6.5, you need to elaborate this proof and its derivation and clearly state every single result you used from Athreya et al. [10]. When you say "From this we can see that the MSE of $\hat{P}_ij$ is of order ...." Ok and this means what? Just before Theorem 4.2 you say "This yields the following result." Try to give more explanation what you mean by this. Theorem 4.2, maybe it would be easier to state first your RE and then its asymptotic RE. Its proof should be clearer. It is ok to repeat proofs and add more details to aid overall clarity. The first paragraph after the Theorem 4.2 should be massaged into the text where you mention the RE for the first time. The second paragraph after the Theorem 4.2 is this what you wanna say 2 = 1−2? Finally, when you talk about effective minimum you need to be aware that the SBM can give you empty blocks and that these degenerate solutions can happen in the optimisation procedure where the model simply hones onto the solution with a smaller number of blocks then what was initially considered. Fig. 2 what is B set to? How you are ranging your block sizes? These are crucial points in your simulation and it should be one of the first things you state and then you can comment on your results. Please write a separate section that details your simulation setup and then a section that discusses your results.

3 Finite Sample Simulations

The simulation setup seems way too basic. It would be interesting to see different values of B especially those whose values are on the boundary of the parameter space (close to 0 or 1). Also, it would be interesting to see different values of K. It seems way too simple to have only K = 2. What happens for K ∈ {5, 10, 15, 20}) and when the block sizes are different relative to each other? Please state the range of vertices that you consider? REst, this you need to re-define. It does look like a strong deviation from your previous notation. Again your simulations need to be structured in such a way so that the methodology of your simulation is a separate section and clearly defined. I should also be able to find all of your parameters straight away and not see that M=100 in the caption of Fig. 3. Also, it would be clearer to see what indices you are summing over if you use three sums instead of one sum. "In Fig. 3, We..." typo capital letter for we. "...significantly less than 1..." please do not use this adjective without p-values (or something alike) and the explanation how do you get such p-values

4 CoRR Brain Graphs: Cross-Validation

In your cross-validation strategy it seems that some Monte Carlo samples will be repeated as there is only 454 scans and when M =1 this can happen. Is this a problem? Could you please comment more on this? I am not sure how do you fit the SBM here? What are the estimation strategies employed here to find the cluster labels? How do you resolve the influence of starting points in your data analysis. Please explain the dimension estimation procedure in more detail in an appropriate part of manuscript. Your comments (pg. 15 paragraphs 1-3) are not referencing any of the figures and I am not sure if you are commenting some specific results or you are giving us some idea of what are your expectations of the simulations. Please be clear. While it might be trivial, I am not sure how the CI of RE is computed. Could the authors clarify this? what is now m? Fig. 5 is too small. How do you conclude that $\hat{P}$ is a better estimate of the true probability matrix than A ̄? Also what is K in this data; it looks like it is 2?

5 Synthetic Data Analysis for Full Rank IEM

I am sorry but I really do not understand your simulation setup. Could you please explain this.

References

[1] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059-1069, 2010. [2] Christophe Ambroise and Catherine Matias. New consistent and asymptotically normal parameter estimates for random-graph mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1):3-35, 2012. [3] Patrick J Wolfe and Sofia C Olhede. Nonparametric graphon estimation. arXiv preprint arXiv:1309.5936, 2013. [4] David S Choi, Patrick J Wolfe, and Edoardo M Airoldi. Stochastic blockmodels with a growing number of classes. Biometrika, page asr053, 2012. [5] Franck Picard, Vincent Miele, Jean-Jacques Daudin, Ludovic Cottret, and St ́ephane Robin. Deciphering the connectivity structure of biological networks using mixnet. BMC bioinformatics, 10(6):1, 2009. [6] Hugo Zanghi, Christophe Ambroise, and Vincent Miele. Fast online graph clustering via erd ̋os-r ́enyi mixture. Pattern Recognition, 41(12):3592-3599, 2008. [7] Hugo Zanghi, Stevenn Volant, and Christophe Ambroise. Clustering based on random graph model embedding vertex features. Pattern Recognition Letters, 31(9):830-836, 2010. [8] Dragana M Pavlovic, Petra E V ́ertes, Edward T Bullmore, William R Schafer, and Thomas E Nichols. Stochastic blockmodeling of the modules and core of the caenorhabditis elegans connectome. PloS one, 9(7):e97584, 2014. [9] J-J Daudin, Franck Picard, and St ́ephane Robin. A mixture model for random graphs. Statistics and computing, 18(2):173-183, 2008. [10] Avanti Athreya, Carey E Priebe, Minh Tang, Vince Lyzinski, David J Marchette, and Daniel L Sussman. A limit theorem for scaled eigenvectors of random dot product graphs. Sankhya A, 78(1):1-18, 2016. [11] Sourav Chatterjee et al. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177-214, 2015.

TangRunze commented 7 years ago

I plotted the results for M=50. However, the RE in Table 1 for M=50 is: JHU 3.44, desikan 3.76, CPAC200 2.44. @dpmcsuss Do we still want to put this results in the paper?

On Dec 14, 2016, at 08:49, Daniel Sussman notifications@github.com wrote:

Reviewer 3

Summary: The Stochastic Blockmodels (SBM) and the Random Dot Product Graph (RDPG) represent two prominent classes of models that frequently appear in network clustering and dimension reduction problems. Both models assume that the observed edges are generated according to some unknown random graph distribution whose first moment (the population Mean Graph) is structurally well approximated by the cluster structure found by the SBM and RDPG. Previous work on the RDPG models suggested a new class of estimators of population Mean Graph that are based on low-rank approximation methods. This paper considers the same class of estimators but in the context of an SBM which is assumed to have a positive semidefinite probability connectivity matrix so that it could be expressed as the RDPG model. This low-rank estimator is contrasted against the classical maximum likelihood estimator (MLE) for the SBM showing some improvements over the classical MLE particularly for large networks.

This paper touches on an important topic and certainly offers some interesting new perspectives which can be useful to the neuroimaging community. However, I fear that the paper needs to go through substantial modifications (that are in reality beyond major revision) in order to merit publication in NeuroImage. Nevertheless, I will ask for major revision but I expect a lot of effort and massive improvement in the second round.

1 Writing skills

The presentation and writing skills need to be radically improved. The authors should carefully proof-read the material before submission and have a sanity check if the current manuscript conforms to the journals' standard. Several things come to mind as potential remedies.

Consider your audience. Publications from this journal are read by researchers with diverse backgrounds including machine-learning, neuroscience, psychology, psychiatry, statistics, engineering and physics. Thus, in order to maximise your impact you really need to write things in a simple manner whenever possible. In particular, your introduction should read in a such way that is understandable to researchers that have never heard of the Stochastic Blockmodel (SBM) or Random Dot Product Graph model. Hence, start from defining the problem in general terms and mention methods known to your audience. See, for example, the paper by Rubinov and Sporns [1] to give you a flavor of what this audience is aware of and then try to put your work in this context. In a nutshell, your introduction should clearly state:

what is the problem (e.g., network clustering) and some examples of data and methods define the models that you will consider (there are so many papers on SBM it is a common courtesy to cite them) and comment their use in neuroimaging data [2, 3, 4, 5, 6, 7, 8, 9]. Please also note that "community structure" is a term that these days is identified as "modular organisation" and as the SBM can identify more diverse structures than that, I would suggest that you simply refer to at as "cluster structure" and thus define this in a more general way. what is that you are proposing how different/novel this work is to the work of previous authors, including [10] and [11] why your work is important to this audience and what they can hope to gain. For example, if you are claiming that your estimator has an advantage in large networks with a small number of realisations, then it would be great if you could give a practical example of such data. Another example, if somebody is interested in logistic regression model and hypothesis testing how your estimator changes the game there? Satisfy the readers' expectation. As a reader I was getting really irritated with some sections because the ground would be prepared for one point and then something completely different would be stated next. For example, just before your Algorithm 1 you are talking about A ̄ and A(m), so I am expecting something that is involving these objects and then I learn that in fact you define some general A that is a symmetric real matrix that has never been mentioned before as the main input of your algorithm. Thus, if you are changing gears you need to reflect that in your text beforehand. Your figure captions need to clearly define all the components of your plot before you proceed to comment on the most striking results. For example, in Figure 2, B11 and B12 are not clearly defined (also at best your notation was B_{11}) and thus if you set specific values beforehand for connection probabilities then you should tell us about it. The same point can be made for Figure 1, where you do not even give colour bars with probability values. Overall, if you have more than one plot in your figure it is expected that you label subplots with A,B, etc., and state what each plot represents (this is much easier than bottom left or top left approach). Your individual plots should be of the same size and also combined in such a way that a figure is of reasonable size and viewing quality. Axis titles should start with capital letter and the same applies for plot subtitles (e.g., Fig. 1 "rank-5 approximation" − > "Rank-5 approximation")

Typos/Grammar. You need to make sure that your text is free from typos. For example, "stochastic block model" appears in the title of section 2.3 while everywhere else you use the "stochastic blockmodel". When you define dot product I really do not see the reason for the use of < · > notation, but if you are really bent on this notation at least define it in a correct way < x, y > should not be a sum over i. In the spirit of the readers' expectation, if you are already using < xi,xj > then it would be easier to define dot product in terms of these variables than to introduce y which is redundant. When you state $B{\tau{i},\tau{j}}=\nu{\tau{i}}^{\top}\nu{\tau{j}}^{\phantom{\top}}$ you are giving an impression that $\nu{\tau{i}}$ is a row vector? (Also, you may want to investigate \phantom command as a quick way to align indices when you are using \top for transpose, so \phantom{\top} can help you there). In Section 2.2, "... the entries A{ij} are distributed independently as Bern(< X_i,Xj >) for ..." you really need to say Bernoulli trials rather than to use abbreviation if you want to be consistent with Section 4.2 "...$\tau{i}$ are drawn iid from a multinomial distribution ...". Note that this list of examples is not exhaustive, there are plenty of other instances when things are not defined but are used or it is assumed that the reader knows your notation very well.

Citation. Please pay attention how you are citing. "Hoff et. al. (2002)" is incorrect. Also when you are citing two or more papers in parenthesis you should really separate them by semicolon rather than comma. Finally, you need to be very careful when you are describing your own original work and when you are using results of other people (i.e. proposing to use is very different to we propose).

Notation. Please do not abuse statistical notation! Capital Roman letters indicate random quantities and small Roman letters indicate their realisations. Saying stuff like P(A{ij} = a{ij}) makes sense. However, this P(A_{ij}) means nothing. It is important early on to define basic statistical notation. Note that you can use bold face for non-scalar quantity. I cannot stress enough how important it is to distinguish between these quantities especially when you are taking expectations. I should be comfortable when I read your text and not wrack my brains out trying to see if an object is random or non-random. Finally, it is important to establish a unifying notation that is your own. So far, my overall impression is that your notation is way too eclectic.

Objectiveness. Please do not misuse adjectives when you are comparing things. For example, you say "... in this instance visual inspection demonstrate that $\hat{P}$ performs significantly better than $\hat{A}$". This suggests that you are using a statistical hypothesis test of some sort and that you have, for example, p-values that can back this up. This is not a correct expression for what are you trying to say. Please try to be more realistic/careful with this.

Structure. Please also consider re-structuring your work. When I am reading about the SBM I should get all the information I need to follow the later developments. So, I believe that in asymptotic results I should already be familiar with the distribution of latent variables and that distribution of edges is conditional on the cluster labels. This should be defined in the section on the SBM. Again, there are other places where you could have structured things in a more reader-friendly way. For example, look at some NeuroImage papers and see how they structure the simulation methods and results. You can do something like that too. Also, NeuroImage allows only sections and subsections, so I really think that you should use these wisely.

2 Technical

My general impression of this work is that it is very sloppy. There is some interesting material, but it will take serious efforts and work to get this into a good shape. I personally expect every NeuroImage paper to be self-contained. If you are using some method then this needs to be defined. Although I do not expect this to appear in the main text, all background material should be easily accessible and contained in either the appendix or in the supplementary material of this manuscript. This should be obviously summarised in your own words and with your own notation.

2.1 Asymptotic Theory

Define MSE before you define Relative Efficiency (RE) "Somewhat surprisingly, the asymptotic relative efficiency will not depend on this fixed sample size M" You are jumping ahead of yourself. I am expecting from you to interpret RE here and not state your main result. Please, explain what values RE can take and what we learn from that. A multinomial distribution takes the total number of trials as a parameter and this needs to be stated. I think it is better for you to state here a "Categorical distribution". Block membership probabilities are defined as a vector of length K, so I am confused what is then $\rho{\tau{i}}$ ? Is this then a vector of length N? Note that in the appendix you state $\sum{i}\rho{i}=1$. Please explain |{i : i = k}| that it means the total number of nodes in block k. Explain why this estimate converges to k as N increases. Are you using Maximum Likelihood theory here? Your comment that the distribution of edges is conditional on the latent variables should have been already discussed when you first time talk about the SBM. For Lemma 4.1 in Section 6.5, you need to elaborate this proof and its derivation and clearly state every single result you used from Athreya et al. [10]. When you say "From this we can see that the MSE of $\hat{P}_ij$ is of order ...." Ok and this means what? Just before Theorem 4.2 you say "This yields the following result." Try to give more explanation what you mean by this. Theorem 4.2, maybe it would be easier to state first your RE and then its asymptotic RE. Its proof should be clearer. It is ok to repeat proofs and add more details to aid overall clarity. The first paragraph after the Theorem 4.2 should be massaged into the text where you mention the RE for the first time. The second paragraph after the Theorem 4.2 is this what you wanna say 2 = 1−2? Finally, when you talk about effective minimum you need to be aware that the SBM can give you empty blocks and that these degenerate solutions can happen in the optimisation procedure where the model simply hones onto the solution with a smaller number of blocks then what was initially considered. Fig. 2 what is B set to? How you are ranging your block sizes? These are crucial points in your simulation and it should be one of the first things you state and then you can comment on your results. Please write a separate section that details your simulation setup and then a section that discusses your results.

3 Finite Sample Simulations

The simulation setup seems way too basic. It would be interesting to see different values of B especially those whose values are on the boundary of the parameter space (close to 0 or 1). Also, it would be interesting to see different values of K. It seems way too simple to have only K = 2. What happens for K ∈ {5, 10, 15, 20}) and when the block sizes are different relative to each other? Please state the range of vertices that you consider? REst, this you need to re-define. It does look like a strong deviation from your previous notation. Again your simulations need to be structured in such a way so that the methodology of your simulation is a separate section and clearly defined. I should also be able to find all of your parameters straight away and not see that M=100 in the caption of Fig. 3. Also, it would be clearer to see what indices you are summing over if you use three sums instead of one sum. "In Fig. 3, We..." typo capital letter for we. "...significantly less than 1..." please do not use this adjective without p-values (or something alike) and the explanation how do you get such p-values

4 CoRR Brain Graphs: Cross-Validation

In your cross-validation strategy it seems that some Monte Carlo samples will be repeated as there is only 454 scans and when M =1 this can happen. Is this a problem? Could you please comment more on this? I am not sure how do you fit the SBM here? What are the estimation strategies employed here to find the cluster labels? How do you resolve the influence of starting points in your data analysis. Please explain the dimension estimation procedure in more detail in an appropriate part of manuscript. Your comments (pg. 15 paragraphs 1-3) are not referencing any of the figures and I am not sure if you are commenting some specific results or you are giving us some idea of what are your expectations of the simulations. Please be clear. While it might be trivial, I am not sure how the CI of RE is computed. Could the authors clarify this? what is now m? Fig. 5 is too small. How do you conclude that $\hat{P}$ is a better estimate of the true probability matrix than A ̄? Also what is K in this data; it looks like it is 2?

5 Synthetic Data Analysis for Full Rank IEM

I am sorry but I really do not understand your simulation setup. Could you please explain this.

References

[1] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059-1069, 2010. [2] Christophe Ambroise and Catherine Matias. New consistent and asymptotically normal parameter estimates for random-graph mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1):3-35, 2012. [3] Patrick J Wolfe and Sofia C Olhede. Nonparametric graphon estimation. arXiv preprint arXiv:1309.5936, 2013. [4] David S Choi, Patrick J Wolfe, and Edoardo M Airoldi. Stochastic blockmodels with a growing number of classes. Biometrika, page asr053, 2012. [5] Franck Picard, Vincent Miele, Jean-Jacques Daudin, Ludovic Cottret, and St ́ephane Robin. Deciphering the connectivity structure of biological networks using mixnet. BMC bioinformatics, 10(6):1, 2009. [6] Hugo Zanghi, Christophe Ambroise, and Vincent Miele. Fast online graph clustering via erd ̋os-r ́enyi mixture. Pattern Recognition, 41(12):3592-3599, 2008. [7] Hugo Zanghi, Stevenn Volant, and Christophe Ambroise. Clustering based on random graph model embedding vertex features. Pattern Recognition Letters, 31(9):830-836, 2010. [8] Dragana M Pavlovic, Petra E V ́ertes, Edward T Bullmore, William R Schafer, and Thomas E Nichols. Stochastic blockmodeling of the modules and core of the caenorhabditis elegans connectome. PloS one, 9(7):e97584, 2014. [9] J-J Daudin, Franck Picard, and St ́ephane Robin. A mixture model for random graphs. Statistics and computing, 18(2):173-183, 2008. [10] Avanti Athreya, Carey E Priebe, Minh Tang, Vince Lyzinski, David J Marchette, and Daniel L Sussman. A limit theorem for scaled eigenvectors of random dot product graphs. Sankhya A, 78(1):1-18, 2016. [11] Sourav Chatterjee et al. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177-214, 2015.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/12#issuecomment-267038064, or mute the thread https://github.com/notifications/unsubscribe-auth/AFM77ae6nQdnLXugQc5c0gZIF-03XL39ks5rH_PpgaJpZM4LM7tO.

TangRunze commented 7 years ago

image

TangRunze commented 7 years ago

Worst case is when all eigenvaleus are almost the same. this leads to a certain type of structure, which looks like somehwat like the identity matrix, howeever we don't see that structure in connectomics. Lacks of gaps ---> block diagonal structure. (That's a little fuzzy but I think we can argue that)

@dpmcsuss The structure (like the identity matrix) you mentioned is after diagonalization?

TangRunze commented 7 years ago

For instance, around line 430, the descriptions of the methods to select the rank are named, but no reference is given to the part in section 6 that describes them.

I think we have the reference in Section 6?

TangRunze commented 7 years ago

The difference between Abar and Phat is as following:

Anything interesting we could comment on this?

On Dec 14, 2016, at 08:41, Daniel Sussman notifications@github.com wrote:

Reviewer 1

For someone with statistical training such as me, reading Tang et al's manuscript was a pleasure. In essence, the work tell us that a low-rank approximation of the connectomes is a good one and that using it in a Jame-Stein-style biased estimator gives a good small-sample estimate of the mean of connectomes. With that said, I fear that in the way the manuscript is currently formulated, the results will be lost to a sizable fraction of the NeuroImage readership. I give below specific comments that can help make the manuscript more relevant to the neuroimaging community.

To start with, I think that intuitions should be put forward more quickly in the manuscript. I was convinced of the validity of the model for connectomes only when I saw figure 5, which comes very late in the manuscript. It would be useful to show connectomes estimated on a few different atlases as well as their low-rank approximation early in the paper; probably before going into the formal models of section 2. Intuitions are important in a multi-disciplinary audience such as that of neuroimage. It is not a trivial insight that brain graphs, including anatomical connectivity graphs, are well approximated by low-rank models. It gives the full meaning of your work. To be fair, reading the last sentence of the abstract "low-rank methods should be a key part of the tool box for researchers studying populations of graphs", I thought that it was an overstatement and that it was true only for a small set of applications.

Along a same line, the manuscript is written in very general terms, and connectomics in neuroimaging sometimes appears as a side aspect of the work. Anchoring the vocabulary and the examples in neuroimaging would help making it more relevant for NeuroImage. For instance, the last sentence of the abstract is a statement on graphs in general; and the introduction starts with and mostly discusses graphs and statistics, rather than the brain. Section 2, on models, starts in a very formal way, and then considers connectomics as "an example application"; it is unclear to the reader why, at this point, the models are relevant to stated goal of defining means in connectomics. When submitted to NeuroImage, the discussion should be framed in the context of brain.

The word "inadmissible" is used in its statistical sense without being defined. It is a common-English word, and hence its meaning here should really be made explicit.

The title is not very related to the work presented. I would strongly argue that it should be changed to something more descriptive about the work.

One a big picture standpoint, the tools presented in this manuscript are useful only if there are very few subjects (between 5 and 10). When there are more, they can be detrimental (as seen from figure 4, where for JHU and Desikan atlases they do not improve upon the naive estimator. This caveat should really be discussed: what application problem do they solve? 10 subjects is below the typical study. Right now this aspect feels hidden under the rug in claims of general usefulness.

The asymptotics presented in section 4.1 are for number of nodes going to infinity with a fixed number of subjects. This are very much low-sample asymptotics. While I understand their interest for proofs, it is not clear to me that they relate well to application settings. I believe that this aspect should be discussed. In particular given that the good performance on the low-rank model is somewhat created by this specific choice of asymptotic regime.

In section 4.2, if I understand things right, the simulations are done informing the algorithm on the actual rank of the adjacency matrix. This does not reflect well the application situation. I think that simulations that include the automatic choice of the dimensionality should be included. In general, I find that the simulation settings are overly favorable to your model: it is a tautology that the estimator proposed will work well on the simulation, as the matrix is actually low rank. I would suggest that the authors should also do simulations on data that slightly break the model, adding non low-rank aspect to the graph, to probe when the estimator breaks down.

Remark 4.4 is an important one: it tells us that if the graph isn't well approximated as a low-rank matrix, the estimator will not perform well. I believe that it should be discussed in the discussion part, and maybe put in perspective with the fact that low-rank approximations seem to work well with brain connectivity graphs.

Some important information is given in section 6, which is the last section (maybe it is meant to be an appendix?). Reading the manuscript was awkward as it wasn't clear to me that some information was given later. For instance, around line 430, the descriptions of the methods to select the rank are named, but no reference is given to the part in section 6 that describes them. Similarly, paragraph 6.3 should be clearly referenced early in the text.

The first column on figure 4 is a bit surprising: computing a mean with a single subject. It would be useful to either discuss why it is relevant, or to remove it.

Dict-learning, ICA, network of network (Abraham, biobank, calhoun)

Line 18: I think that the word "sample" should be plural.

As a nitpick on wording on line 26, I would rather say "a bias-variance trade-off", rather than "the bias-variance trade-off". To me their are many different tradeoffs, as there are many different kind of biases. The bias that the authors are taking here is a useful one, and grounds the success of the work.

I was surprised on line 177 to read about positive semidefinite matrices while line 102 mentions that the diagonal of the matrices are zero. To me they cannot be positive semidefinite without be zero (a simple proof would be that the trace is invariant to the basis; considering the basis in which the matrix is diagonal, the trace is the sum of the eigenvalue. If the diagonal is zero, this sum is hence zero. As or the eigenvalues are positive, they should all be zeros). I suspect that this problem is related to the comment on line 240 and to section 6.2. However, I must say that I found that it didn't make reading the manuscript easy.

On figure 4, the legend says literally "A bar" and "P hat". I must confess that reading this quickly, I did not immediately make the connection with the symbols.

Figure 5 should probably also show the subtraction of A bar and P hat, as the difference is very hard to see.

Reviewer 2

In this paper, Tang and colleagues proposed a low-rank method to estimate the mean of a collection of undirected and unweighted graphs. Under a semi-positive definite stochastic block model, or more generally a random dot product graph model, the authors derived analytic results showing that the proposed method is guaranteed to have better performance than a simple element-wise average of the observed adjacency matrices when the network is large and sample size is small. Using simulations, they confirmed that this is also true when the data is generated from an underlying independent edge model.

Overall this paper is clearly written, easy to follow and the proposed method looks sound to my assessment. However, I have two major concerns about this paper.

(1) I think this paper doesn't fit very well into the scope of NeuroImage. The work is completely methodologically oriented, has a strong statistical taste and was motivated by and grounded on the general graph theory. The introduction did not refer to any neuroimaging studies. Although the paper included a real DTI data set, and it was only used to synthesize data and validate the proposed method. To publish this paper in NeuroImage, I think the authors need to build a stronger connection between the method and the current neuroimaging literature, demonstrate its potential usefulness in addressing important scientific questions in brain network studies and provide more interesting real data applications. A statistical journal might be a more appropriate target for the current form of this paper.

(2) It seems from figures 4 and 8 that the proposed method outperforms the element-wise average only when the sample size is very small (N<10). Beyond this sample size, the two methods produce indistinguishable results or the simple average approach can be better. I think this considerably weakens the usefulness of the method in real data analysis. Currently in the neuroimaging world, most population based brain network studies have a sample size much larger than N=10. The proposed method could still be useful when the mean graph of a small subgroup of the subjects needs to be estimated but this paper did not provide an example. In the Discussion section, it was mentioned that the low rank representation can be used to improve interpretability of the data and provide biological insight. Unfortunately, this wasn't demonstrated in a real data set either.

Some minor comments:

— perhaps use a more informative title? — line 223: what do \tilde{U} and \tilde{S} and represent? — line 224: largest eigenvalues of A —> largest eigenvalues of \bar{A} — line 300: \rho_2 = 1-\rho_2 —> \rho_2 = 1-\rho_1 — line 304: the relative efficiency should be N/(N-1)?

Reviewer 3

Summary: The Stochastic Blockmodels (SBM) and the Random Dot Product Graph (RDPG) represent two prominent classes of models that frequently appear in network clustering and dimension reduction problems. Both models assume that the observed edges are generated according to some unknown random graph distribution whose first moment (the population Mean Graph) is structurally well approximated by the cluster structure found by the SBM and RDPG. Previous work on the RDPG models suggested a new class of estimators of population Mean Graph that are based on low-rank approximation methods. This paper considers the same class of estimators but in the context of an SBM which is assumed to have a positive semidefinite probability connectivity matrix so that it could be expressed as the RDPG model. This low-rank estimator is contrasted against the classical maximum likelihood estimator (MLE) for the SBM showing some improvements over the classical MLE particularly for large networks.

This paper touches on an important topic and certainly offers some interesting new perspectives which can be useful to the neuroimaging community. However, I fear that the paper needs to go through substantial modifications (that are in reality beyond major revision) in order to merit publication in NeuroImage. Nevertheless, I will ask for major revision but I expect a lot of effort and massive improvement in the second round.

1 Writing skills

The presentation and writing skills need to be radically improved. The authors should carefully proof-read the material before submission and have a sanity check if the current manuscript conforms to the journals' standard. Several things come to mind as potential remedies.

Consider your audience. Publications from this journal are read by researchers with diverse backgrounds including machine-learning, neuroscience, psychology, psychiatry, statistics, engineering and physics. Thus, in order to maximise your impact you really need to write things in a simple manner whenever possible. In particular, your introduction should read in a such way that is understandable to researchers that have never heard of the Stochastic Blockmodel (SBM) or Random Dot Product Graph model. Hence, start from defining the problem in general terms and mention methods known to your audience. See, for example, the paper by Rubinov and Sporns [1] to give you a flavor of what this audience is aware of and then try to put your work in this context. In a nutshell, your introduction should clearly state: what is the problem (e.g., network clustering) and some examples of data and methods define the models that you will consider (there are so many papers on SBM it is a common courtesy to cite them) and comment their use in neuroimaging data [2, 3, 4, 5, 6, 7, 8, 9]. Please also note that "community structure" is a term that these days is identified as "modular organisation" and as the SBM can identify more diverse structures than that, I would suggest that you simply refer to at as "cluster structure" and thus define this in a more general way. what is that you are proposing how different/novel this work is to the work of previous authors, including [10] and [11] why your work is important to this audience and what they can hope to gain. For example, if you are claiming that your estimator has an advantage in large networks with a small number of realisations, then it would be great if you could give a practical example of such data. Another example, if somebody is interested in logistic regression model and hypothesis testing how your estimator changes the game there? Satisfy the readers' expectation. As a reader I was getting really irritated with some sections because the ground would be prepared for one point and then something completely different would be stated next. For example, just before your Algorithm 1 you are talking about A ̄ and A(m), so I am expecting something that is involving these objects and then I learn that in fact you define some general A that is a symmetric real matrix that has never been mentioned before as the main input of your algorithm. Thus, if you are changing gears you need to reflect that in your text beforehand. Your figure captions need to clearly define all the components of your plot before you proceed to comment on the most striking results. For example, in Figure 2, B11 and B12 are not clearly defined (also at best your notation was B{11}) and thus if you set specific values beforehand for connection probabilities then you should tell us about it. The same point can be made for Figure 1, where you do not even give colour bars with probability values. Overall, if you have more than one plot in your figure it is expected that you label subplots with A,B, etc., and state what each plot represents (this is much easier than bottom left or top left approach). Your individual plots should be of the same size and also combined in such a way that a figure is of reasonable size and viewing quality. Axis titles should start with capital letter and the same applies for plot subtitles (e.g., Fig. 1 "rank-5 approximation" − > "Rank-5 approximation") Typos/Grammar. You need to make sure that your text is free from typos. For example, "stochastic block model" appears in the title of section 2.3 while everywhere else you use the "stochastic blockmodel". When you define dot product I really do not see the reason for the use of < · > notation, but if you are really bent on this notation at least define it in a correct way < x, y > should not be a sum over i. In the spirit of the readers' expectation, if you are already using < xi,xj > then it would be easier to define dot product in terms of these variables than to introduce y which is redundant. When you state $B{\tau{i},\tau{j}}=\nu{\tau{i}}^{\top}\nu{\tau{j}}^{\phantom{\top}}$ you are giving an impression that $\nu{\tau{i}}$ is a row vector? (Also, you may want to investigate \phantom command as a quick way to align indices when you are using \top for transpose, so \phantom{\top} can help you there). In Section 2.2, "... the entries A_{ij} are distributed independently as Bern(< X_i,Xj >) for ..." you really need to say Bernoulli trials rather than to use abbreviation if you want to be consistent with Section 4.2 "...$\tau{i}$ are drawn iid from a multinomial distribution ...". Note that this list of examples is not exhaustive, there are plenty of other instances when things are not defined but are used or it is assumed that the reader knows your notation very well. Citation. Please pay attention how you are citing. "Hoff et. al. (2002)" is incorrect. Also when you are citing two or more papers in parenthesis you should really separate them by semicolon rather than comma. Finally, you need to be very careful when you are describing your own original work and when you are using results of other people (i.e. proposing to use is very different to we propose). Notation. Please do not abuse statistical notation! Capital Roman letters indicate random quantities and small Roman letters indicate their realisations. Saying stuff like P(A{ij} = a{ij}) makes sense. However, this P(A_{ij}) means nothing. It is important early on to define basic statistical notation. Note that you can use bold face for non-scalar quantity. I cannot stress enough how important it is to distinguish between these quantities especially when you are taking expectations. I should be comfortable when I read your text and not wrack my brains out trying to see if an object is random or non-random. Finally, it is important to establish a unifying notation that is your own. So far, my overall impression is that your notation is way too eclectic. Objectiveness. Please do not misuse adjectives when you are comparing things. For example, you say "... in this instance visual inspection demonstrate that $\hat{P}$ performs significantly better than $\hat{A}$". This suggests that you are using a statistical hypothesis test of some sort and that you have, for example, p-values that can back this up. This is not a correct expression for what are you trying to say. Please try to be more realistic/careful with this. Structure. Please also consider re-structuring your work. When I am reading about the SBM I should get all the information I need to follow the later developments. So, I believe that in asymptotic results I should already be familiar with the distribution of latent variables and that distribution of edges is conditional on the cluster labels. This should be defined in the section on the SBM. Again, there are other places where you could have structured things in a more reader-friendly way. For example, look at some NeuroImage papers and see how they structure the simulation methods and results. You can do something like that too. Also, NeuroImage allows only sections and subsections, so I really think that you should use these wisely. 2 Technical

My general impression of this work is that it is very sloppy. There is some interesting material, but it will take serious efforts and work to get this into a good shape. I personally expect every NeuroImage paper to be self-contained. If you are using some method then this needs to be defined. Although I do not expect this to appear in the main text, all background material should be easily accessible and contained in either the appendix or in the supplementary material of this manuscript. This should be obviously summarised in your own words and with your own notation.

2.1 Asymptotic Theory

Define MSE before you define Relative Efficiency (RE) "Somewhat surprisingly, the asymptotic relative efficiency will not depend on this fixed sample size M" You are jumping ahead of yourself. I am expecting from you to interpret RE here and not state your main result. Please, explain what values RE can take and what we learn from that. A multinomial distribution takes the total number of trials as a parameter and this needs to be stated. I think it is better for you to state here a "Categorical distribution". Block membership probabilities are defined as a vector of length K, so I am confused what is then $\rho{\tau{i}}$ ? Is this then a vector of length N? Note that in the appendix you state $\sum{i}\rho{i}=1$. Please explain |{i : i = k}| that it means the total number of nodes in block k. Explain why this estimate converges to k as N increases. Are you using Maximum Likelihood theory here? Your comment that the distribution of edges is conditional on the latent variables should have been already discussed when you first time talk about the SBM. For Lemma 4.1 in Section 6.5, you need to elaborate this proof and its derivation and clearly state every single result you used from Athreya et al. [10]. When you say "From this we can see that the MSE of $\hat{P}_ij$ is of order ...." Ok and this means what? Just before Theorem 4.2 you say "This yields the following result." Try to give more explanation what you mean by this. Theorem 4.2, maybe it would be easier to state first your RE and then its asymptotic RE. Its proof should be clearer. It is ok to repeat proofs and add more details to aid overall clarity. The first paragraph after the Theorem 4.2 should be massaged into the text where you mention the RE for the first time. The second paragraph after the Theorem 4.2 is this what you wanna say 2 = 1−2? Finally, when you talk about effective minimum you need to be aware that the SBM can give you empty blocks and that these degenerate solutions can happen in the optimisation procedure where the model simply hones onto the solution with a smaller number of blocks then what was initially considered. Fig. 2 what is B set to? How you are ranging your block sizes? These are crucial points in your simulation and it should be one of the first things you state and then you can comment on your results. Please write a separate section that details your simulation setup and then a section that discusses your results. 3 Finite Sample Simulations

The simulation setup seems way too basic. It would be interesting to see different values of B especially those whose values are on the boundary of the parameter space (close to 0 or 1). Also, it would be interesting to see different values of K. It seems way too simple to have only K = 2. What happens for K ∈ {5, 10, 15, 20}) and when the block sizes are different relative to each other? Please state the range of vertices that you consider? REst, this you need to re-define. It does look like a strong deviation from your previous notation. Again your simulations need to be structured in such a way so that the methodology of your simulation is a separate section and clearly defined. I should also be able to find all of your parameters straight away and not see that M=100 in the caption of Fig. 3. Also, it would be clearer to see what indices you are summing over if you use three sums instead of one sum. "In Fig. 3, We..." typo capital letter for we. *"...significantly less than 1..." please do not use this adjective without p-values (or something alike) and the explanation how do you get such p-values

4 CoRR Brain Graphs: Cross-Validation

In your cross-validation strategy it seems that some Monte Carlo samples will be repeated as there is only 454 scans and when M =1 this can happen. Is this a problem? Could you please comment more on this? I am not sure how do you fit the SBM here? What are the estimation strategies employed here to find the cluster labels? How do you resolve the influence of starting points in your data analysis. Please explain the dimension estimation procedure in more detail in an appropriate part of manuscript. Your comments (pg. 15 paragraphs 1-3) are not referencing any of the figures and I am not sure if you are commenting some specific results or you are giving us some idea of what are your expectations of the simulations. Please be clear. While it might be trivial, I am not sure how the CI of RE is computed. Could the authors clarify this? what is now m? *Fig. 5 is too small. How do you conclude that $\hat{P}$ is a better estimate of the true probability matrix than A ̄? Also what is K in this data; it looks like it is 2?

5 Synthetic Data Analysis for Full Rank IEM

I am sorry but I really do not understand your simulation setup. Could you please explain this. References

[1] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059-1069, 2010. [2] Christophe Ambroise and Catherine Matias. New consistent and asymptotically normal parameter estimates for random-graph mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1):3-35, 2012. [3] Patrick J Wolfe and Sofia C Olhede. Nonparametric graphon estimation. arXiv preprint arXiv:1309.5936, 2013. [4] David S Choi, Patrick J Wolfe, and Edoardo M Airoldi. Stochastic blockmodels with a growing number of classes. Biometrika, page asr053, 2012. [5] Franck Picard, Vincent Miele, Jean-Jacques Daudin, Ludovic Cottret, and St ́ephane Robin. Deciphering the connectivity structure of biological networks using mixnet. BMC bioinformatics, 10(6):1, 2009. [6] Hugo Zanghi, Christophe Ambroise, and Vincent Miele. Fast online graph clustering via erd ̋os-r ́enyi mixture. Pattern Recognition, 41(12):3592-3599, 2008. [7] Hugo Zanghi, Stevenn Volant, and Christophe Ambroise. Clustering based on random graph model embedding vertex features. Pattern Recognition Letters, 31(9):830-836, 2010. [8] Dragana M Pavlovic, Petra E V ́ertes, Edward T Bullmore, William R Schafer, and Thomas E Nichols. Stochastic blockmodeling of the modules and core of the caenorhabditis elegans connectome. PloS one, 9(7):e97584, 2014. [9] J-J Daudin, Franck Picard, and St ́ephane Robin. A mixture model for random graphs. Statistics and computing, 18(2):173-183, 2008. [10] Avanti Athreya, Carey E Priebe, Minh Tang, Vince Lyzinski, David J Marchette, and Daniel L Sussman. A limit theorem for scaled eigenvectors of random dot product graphs. Sankhya A, 78(1):1-18, 2016. [11] Sourav Chatterjee et al. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177-214, 2015.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AFM77aeP9tPFAJoX7seM51k7s3w2Zh_Qks5rH_IdgaJpZM4LM7tO.

TangRunze commented 7 years ago

For example, just before your Algorithm 1 you are talking about A ̄ and A(m), so I am expecting something that is involving these objects and then I learn that in fact you define some general A that is a symmetric real matrix that has never been mentioned before as the main input of your algorithm. Thus, if you are changing gears you need to reflect that in your text beforehand.

Not sure about how to revise this.

dpmcsuss commented 7 years ago

Re the first figure. For m=50 it looks like the error gets worse before it gets better so that is kind of intriguing.

  1. Does the result change much if we use SVD rather than EIG?
  2. Do either of the dimension selection criteria take into account sample size?
dpmcsuss commented 7 years ago

Worst case is when all eigenvaleus are almost the same. this leads to a certain type of structure, which looks like somehwat like the identity matrix, however we don't see that structure in connectomics. Lacks of gaps ---> block diagonal structure. (That's a little fuzzy but I think we can argue that)

@dpmcsuss The structure (like the identity matrix) you mentioned is after diagonalization?

Well, if all the eigenvalues are the same then the matrix is a scalar multiple of the identity. If we take a matrix whose eigenvalues are almost all very close then it should also be close to a scalar multiple of the identity. (I can give you more details about why this is true and perhaps we can think about some variants of this too when we chat).

TangRunze commented 7 years ago

On Jan 3, 2017, at 09:53, Daniel Sussman notifications@github.com wrote:

Re the first figure. For m=50 it looks like the error gets worse before it gets better so that is kind of intriguing.

Do you mean the middle part of the dimensions? (e.g. 10-30 in JHU M=50)

Does the result change much if we use SVD rather than EIG?

Results will be out this afternoon.

Do either of the dimension selection criteria take into account sample size?

Zhu & Ghodsi doesn’t, but USVT does because of the threshold c*sqrt(n/m)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/12#issuecomment-270131136, or mute the thread https://github.com/notifications/unsubscribe-auth/AFM77SqwgTyO_E7JcARIHsu5wYUr3Ud4ks5rOmDjgaJpZM4LM7tO.

dpmcsuss commented 7 years ago

Do you mean the middle part of the dimensions? (e.g. 10-30 in JHU M=50)

Yes

TangRunze commented 7 years ago

On Jan 3, 2017, at 10:03, Daniel Sussman notifications@github.com wrote:

Worst case is when all eigenvaleus are almost the same. this leads to a certain type of structure, which looks like somehwat like the identity matrix, however we don't see that structure in connectomics. Lacks of gaps ---> block diagonal structure. (That's a little fuzzy but I think we can argue that)

@dpmcsuss https://github.com/dpmcsuss The structure (like the identity matrix) you mentioned is after diagonalization?

Well, if all the eigenvalues are the same then the matrix is a scalar multiple of the identity. If we take a matrix whose eigenvalues are almost all very close then it should also be close to a scalar multiple of the identity. (I can give you more details about why this is true and perhaps we can think about some variants of this too when we chat).

Right! Let’s say the eigenvalues are all c. Then the matrix A = UcIU^T = cU U^T. But I don’t have a good understanding of what U U^T will look like. Can you explain a little more what “close” means here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/12#issuecomment-270133534, or mute the thread https://github.com/notifications/unsubscribe-auth/AFM77aZnD6ctP3YDXWeFPDf9Zt2ga73bks5rOmMsgaJpZM4LM7tO.

dpmcsuss commented 7 years ago

But I don’t have a good understanding of what U U^T will look like.

U is an orthogonal matrix so UU^T = I

TangRunze commented 7 years ago

Oh right! I was still thinking about the truncated case…

On Jan 3, 2017, at 10:50, Daniel Sussman notifications@github.com wrote:

But I don’t have a good understanding of what U U^T will look like.

U is an orthogonal matrix so UU^T = I

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/12#issuecomment-270145382, or mute the thread https://github.com/notifications/unsubscribe-auth/AFM77RTHa6GdS6UDgW20Aewe6Ajdz7XVks5rOm5AgaJpZM4LM7tO.

dpmcsuss commented 7 years ago

Yes, in the truncated case there is less we can say but if we are thinking about a truly worst case for spectral methods, it would be a full rank with all eigenvalues (nearly) equal.

TangRunze commented 7 years ago

Right! Thanks!

On Jan 3, 2017, at 11:19, Daniel Sussman notifications@github.com wrote:

Yes, in the truncated case there is less we can say but if we are thinking about a truly worst case for spectral methods, it would be a full rank with all eigenvalues (nearly) equal.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/12#issuecomment-270153285, or mute the thread https://github.com/notifications/unsubscribe-auth/AFM77bVGKjvAt_PF91dMuw6ZtmQUZGiaks5rOnUbgaJpZM4LM7tO.

TangRunze commented 7 years ago

Results based on SVD: plot_zoom_png

Results based on Eigen-decomposition: plot_zoom_png

It’s interesting when M is large, SVD performs better than Eigen-decomposition. I also plot the eigenvalues of 1 sample with different M based on different atlases as following, not sure if this is helpful. plot_zoom_png

On Jan 3, 2017, at 10:47, Runze Tang tangrunze@gmail.com wrote:

On Jan 3, 2017, at 09:53, Daniel Sussman <notifications@github.com mailto:notifications@github.com> wrote:

Re the first figure. For m=50 it looks like the error gets worse before it gets better so that is kind of intriguing.

Do you mean the middle part of the dimensions? (e.g. 10-30 in JHU M=50)

Does the result change much if we use SVD rather than EIG?

Results will be out this afternoon.

Do either of the dimension selection criteria take into account sample size?

Zhu & Ghodsi doesn’t, but USVT does because of the threshold c*sqrt(n/m)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jhu-graphstat/LLG/issues/12#issuecomment-270131136, or mute the thread https://github.com/notifications/unsubscribe-auth/AFM77SqwgTyO_E7JcARIHsu5wYUr3Ud4ks5rOmDjgaJpZM4LM7tO.

dpmcsuss commented 7 years ago

I don't see any pics.

TangRunze commented 7 years ago

Sorry, the pics can be seen in the emails but not here. Now I've updated them.