jhu-graphstat / LLG

0 stars 2 forks source link

Unresolved Comments #13

Closed TangRunze closed 7 years ago

TangRunze commented 7 years ago

Reviewer 1

Along a same line, the manuscript is written in very general terms, and connectomics in neuroimaging sometimes appears as a side aspect of the work. Anchoring the vocabulary and the examples in neuroimaging would help making it more relevant for NeuroImage. For instance, the last sentence of the abstract is a statement on graphs in general; and the introduction starts with and mostly discusses graphs and statistics, rather than the brain. Section 2, on models, starts in a very formal way, and then considers connectomics as "an example application"; it is unclear to the reader why, at this point, the models are relevant to stated goal of defining means in connectomics. When submitted to NeuroImage, the discussion should be framed in the context of brain.

Some important information is given in section 6, which is the last section (maybe it is meant to be an appendix?). Reading the manuscript was awkward as it wasn't clear to me that some information was given later. For instance, around line 430, the descriptions of the methods to select the rank are named, but no reference is given to the part in section 6 that describes them. Similarly, paragraph 6.3 should be clearly referenced early in the text.

Dict-learning, ICA, network of network (Abraham, biobank, calhoun)

Figure 5 should probably also show the subtraction of A bar and P hat, as the difference is very hard to see.

TangRunze commented 7 years ago

Reviewer 2

(1) I think this paper doesn't fit very well into the scope of NeuroImage. The work is completely methodologically oriented, has a strong statistical taste and was motivated by and grounded on the general graph theory. The introduction did not refer to any neuroimaging studies. Although the paper included a real DTI data set, and it was only used to synthesize data and validate the proposed method. To publish this paper in NeuroImage, I think the authors need to build a stronger connection between the method and the current neuroimaging literature, demonstrate its potential usefulness in addressing important scientific questions in brain network studies and provide more interesting real data applications. A statistical journal might be a more appropriate target for the current form of this paper.

(2) It seems from figures 4 and 8 that the proposed method outperforms the element-wise average only when the sample size is very small (N<10). Beyond this sample size, the two methods produce indistinguishable results or the simple average approach can be better. I think this considerably weakens the usefulness of the method in real data analysis. Currently in the neuroimaging world, most population based brain network studies have a sample size much larger than N=10. The proposed method could still be useful when the mean graph of a small subgroup of the subjects needs to be estimated but this paper did not provide an example. In the Discussion section, it was mentioned that the low rank representation can be used to improve interpretability of the data and provide biological insight. Unfortunately, this wasn't demonstrated in a real data set either.

TangRunze commented 7 years ago

Reviewer 3

1 Writing skills

The presentation and writing skills need to be radically improved. The authors should carefully proof-read the material before submission and have a sanity check if the current manuscript conforms to the journals' standard. Several things come to mind as potential remedies.

Consider your audience. Publications from this journal are read by researchers with diverse backgrounds including machine-learning, neuroscience, psychology, psychiatry, statistics, engineering and physics. Thus, in order to maximise your impact you really need to write things in a simple manner whenever possible. In particular, your introduction should read in a such way that is understandable to researchers that have never heard of the Stochastic Blockmodel (SBM) or Random Dot Product Graph model. Hence, start from defining the problem in general terms and mention methods known to your audience. See, for example, the paper by Rubinov and Sporns [1] to give you a flavor of what this audience is aware of and then try to put your work in this context. In a nutshell, your introduction should clearly state:

  • what is the problem (e.g., network clustering) and some examples of data and methods
  • define the models that you will consider (there are so many papers on SBM it is a common courtesy to cite them) and comment their use in neuroimaging data [2, 3, 4, 5, 6, 7, 8, 9]. Please also note that "community structure" is a term that these days is identified as "modular organisation" and as the SBM can identify more diverse structures than that, I would suggest that you simply refer to at as "cluster structure" and thus define this in a more general way.
  • what is that you are proposing
  • how different/novel this work is to the work of previous authors, including [10] and [11]
  • why your work is important to this audience and what they can hope to gain. For example, if you are claiming that your estimator has an advantage in large networks with a small number of realisations, then it would be great if you could give a practical example of such data. Another example, if somebody is interested in logistic regression model and hypothesis testing how your estimator changes the game there?

Satisfy the readers' expectation. As a reader I was getting really irritated with some sections because the ground would be prepared for one point and then something completely different would be stated next. For example, just before your Algorithm 1 you are talking about A ̄ and A(m), so I am expecting something that is involving these objects and then I learn that in fact you define some general A that is a symmetric real matrix that has never been mentioned before as the main input of your algorithm. Thus, if you are changing gears you need to reflect that in your text beforehand.

Overall, if you have more than one plot in your figure it is expected that you label subplots with A,B, etc., and state what each plot represents (this is much easier than bottom left or top left approach).

Typos/Grammar. You need to make sure that your text is free from typos. When you define dot product I really do not see the reason for the use of < · > notation, but if you are really bent on this notation at least define it in a correct way < x, y > should not be a sum over i.

In the spirit of the readers' expectation, if you are already using < xi,xj > then it would be easier to define dot product in terms of these variables than to introduce y which is redundant. When you state $B{\tau{i},\tau{j}}=\nu{\tau{i}}^{\top}\nu{\tau{j}}^{\phantom{\top}}$ you are giving an impression that $\nu{\tau{i}}$ is a row vector? (Also, you may want to investigate \phantom command as a quick way to align indices when you are using \top for transpose, so \phantom{\top} can help you there). In Section 2.2, "... the entries A{ij} are distributed independently as Bern(< X_i,Xj >) for ..." you really need to say Bernoulli trials rather than to use abbreviation if you want to be consistent with Section 4.2 "...$\tau{i}$ are drawn iid from a multinomial distribution ...". Note that this list of examples is not exhaustive, there are plenty of other instances when things are not defined but are used or it is assumed that the reader knows your notation very well.

Citation. Please pay attention how you are citing. "Hoff et. al. (2002)" is incorrect. Also when you are citing two or more papers in parenthesis you should really separate them by semicolon rather than comma. Finally, you need to be very careful when you are describing your own original work and when you are using results of other people (i.e. proposing to use is very different to we propose).

Notation. Please do not abuse statistical notation! Capital Roman letters indicate random quantities and small Roman letters indicate their realisations. Saying stuff like P(A{ij} = a{ij}) makes sense. However, this P(A_{ij}) means nothing. It is important early on to define basic statistical notation. Note that you can use bold face for non-scalar quantity. I cannot stress enough how important it is to distinguish between these quantities especially when you are taking expectations. I should be comfortable when I read your text and not wrack my brains out trying to see if an object is random or non-random. Finally, it is important to establish a unifying notation that is your own. So far, my overall impression is that your notation is way too eclectic.

Structure. Please also consider re-structuring your work. When I am reading about the SBM I should get all the information I need to follow the later developments. So, I believe that in asymptotic results I should already be familiar with the distribution of latent variables and that distribution of edges is conditional on the cluster labels. This should be defined in the section on the SBM. Again, there are other places where you could have structured things in a more reader-friendly way. For example, look at some NeuroImage papers and see how they structure the simulation methods and results. You can do something like that too. Also, NeuroImage allows only sections and subsections, so I really think that you should use these wisely.

2 Technical

My general impression of this work is that it is very sloppy. There is some interesting material, but it will take serious efforts and work to get this into a good shape. I personally expect every NeuroImage paper to be self-contained. If you are using some method then this needs to be defined. Although I do not expect this to appear in the main text, all background material should be easily accessible and contained in either the appendix or in the supplementary material of this manuscript. This should be obviously summarised in your own words and with your own notation.

2.1 Asymptotic Theory

Block membership probabilities are defined as a vector of length K, so I am confused what is then $\rho{\tau{i}}$ ? Is this then a vector of length N? Note that in the appendix you state $\sum{i}\rho{i}=1$. Please explain |{i : i = k}| that it means the total number of nodes in block k. Explain why this estimate converges to k as N increases. Are you using Maximum Likelihood theory here? Your comment that the distribution of edges is conditional on the latent variables should have been already discussed when you first time talk about the SBM. For Lemma 4.1 in Section 6.5, you need to elaborate this proof and its derivation and clearly state every single result you used from Athreya et al. [10]. When you say "From this we can see that the MSE of $\hat{P}_ij$ is of order ...." Ok and this means what? Just before Theorem 4.2 you say "This yields the following result." Try to give more explanation what you mean by this. Theorem 4.2, maybe it would be easier to state first your RE and then its asymptotic RE. Its proof should be clearer. It is ok to repeat proofs and add more details to aid overall clarity. The first paragraph after the Theorem 4.2 should be massaged into the text where you mention the RE for the first time. The second paragraph after the Theorem 4.2 is this what you wanna say 2 = 1−2? Finally, when you talk about effective minimum you need to be aware that the SBM can give you empty blocks and that these degenerate solutions can happen in the optimisation procedure where the model simply hones onto the solution with a smaller number of blocks then what was initially considered. Fig. 2 what is B set to? How you are ranging your block sizes? These are crucial points in your simulation and it should be one of the first things you state and then you can comment on your results. Please write a separate section that details your simulation setup and then a section that discusses your results.

3 Finite Sample Simulations

The simulation setup seems way too basic. It would be interesting to see different values of B especially those whose values are on the boundary of the parameter space (close to 0 or 1). Also, it would be interesting to see different values of K. It seems way too simple to have only K = 2. What happens for K ∈ {5, 10, 15, 20}) and when the block sizes are different relative to each other? Please state the range of vertices that you consider? REst, this you need to re-define. It does look like a strong deviation from your previous notation. Again your simulations need to be structured in such a way so that the methodology of your simulation is a separate section and clearly defined. I should also be able to find all of your parameters straight away and not see that M=100 in the caption of Fig. 3. Also, it would be clearer to see what indices you are summing over if you use three sums instead of one sum. "In Fig. 3, We..." typo capital letter for we. "...significantly less than 1..." please do not use this adjective without p-values (or something alike) and the explanation how do you get such p-values

4 CoRR Brain Graphs: Cross-Validation

In your cross-validation strategy it seems that some Monte Carlo samples will be repeated as there is only 454 scans and when M =1 this can happen. Is this a problem? Could you please comment more on this? I am not sure how do you fit the SBM here? What are the estimation strategies employed here to find the cluster labels? How do you resolve the influence of starting points in your data analysis. Please explain the dimension estimation procedure in more detail in an appropriate part of manuscript. Your comments (pg. 15 paragraphs 1-3) are not referencing any of the figures and I am not sure if you are commenting some specific results or you are giving us some idea of what are your expectations of the simulations. Please be clear. While it might be trivial, I am not sure how the CI of RE is computed. Could the authors clarify this? what is now m? Fig. 5 is too small. How do you conclude that $\hat{P}$ is a better estimate of the true probability matrix than A ̄? Also what is K in this data; it looks like it is 2?

5 Synthetic Data Analysis for Full Rank IEM

I am sorry but I really do not understand your simulation setup. Could you please explain this.

References

[1] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059-1069, 2010. [2] Christophe Ambroise and Catherine Matias. New consistent and asymptotically normal parameter estimates for random-graph mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1):3-35, 2012. [3] Patrick J Wolfe and Sofia C Olhede. Nonparametric graphon estimation. arXiv preprint arXiv:1309.5936, 2013. [4] David S Choi, Patrick J Wolfe, and Edoardo M Airoldi. Stochastic blockmodels with a growing number of classes. Biometrika, page asr053, 2012. [5] Franck Picard, Vincent Miele, Jean-Jacques Daudin, Ludovic Cottret, and St ́ephane Robin. Deciphering the connectivity structure of biological networks using mixnet. BMC bioinformatics, 10(6):1, 2009. [6] Hugo Zanghi, Christophe Ambroise, and Vincent Miele. Fast online graph clustering via erd ̋os-r ́enyi mixture. Pattern Recognition, 41(12):3592-3599, 2008. [7] Hugo Zanghi, Stevenn Volant, and Christophe Ambroise. Clustering based on random graph model embedding vertex features. Pattern Recognition Letters, 31(9):830-836, 2010. [8] Dragana M Pavlovic, Petra E V ́ertes, Edward T Bullmore, William R Schafer, and Thomas E Nichols. Stochastic blockmodeling of the modules and core of the caenorhabditis elegans connectome. PloS one, 9(7):e97584, 2014. [9] J-J Daudin, Franck Picard, and St ́ephane Robin. A mixture model for random graphs. Statistics and computing, 18(2):173-183, 2008. [10] Avanti Athreya, Carey E Priebe, Minh Tang, Vince Lyzinski, David J Marchette, and Daniel L Sussman. A limit theorem for scaled eigenvectors of random dot product graphs. Sankhya A, 78(1):1-18, 2016. [11] Sourav Chatterjee et al. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177-214, 2015.