Open TangRunze opened 7 years ago
For someone with statistical training such as me, reading Tang et al's manuscript was a pleasure. In essence, the work tell us that a low-rank approximation of the connectomes is a good one and that using it in a Jame-Stein-style biased estimator gives a good small-sample estimate of the mean of connectomes. With that said, I fear that in the way the manuscript is currently formulated, the results will be lost to a sizable fraction of the NeuroImage readership. I give below specific comments that can help make the manuscript more relevant to the neuroimaging community.
To start with, I think that intuitions should be put forward more quickly in the manuscript. I was convinced of the validity of the model for connectomes only when I saw figure 5, which comes very late in the manuscript. It would be useful to show connectomes estimated on a few different atlases as well as their low-rank approximation early in the paper; probably before going into the formal models of section 2. Intuitions are important in a multi-disciplinary audience such as that of neuroimage. It is not a trivial insight that brain graphs, including anatomical connectivity graphs, are well approximated by low-rank models. It gives the full meaning of your work. To be fair, reading the last sentence of the abstract "low-rank methods should be a key part of the tool box for researchers studying populations of graphs", I thought that it was an overstatement and that it was true only for a small set of applications.
As suggested, we move the figure 5 in original draft to the introduction section, so that we can show the readers the intuition why low-rank approximation is proper in brain graphs analysis at the very beginning. We also explain the intuitions for each model (e.g. SBM, RDPG) when we first introduce them in Section 1.
Along a same line, the manuscript is written in very general terms, and connectomics in neuroimaging sometimes appears as a side aspect of the work. Anchoring the vocabulary and the examples in neuroimaging would help making it more relevant for NeuroImage. For instance, the last sentence of the abstract is a statement on graphs in general; and the introduction starts with and mostly discusses graphs and statistics, rather than the brain. Section 2, on models, starts in a very formal way, and then considers connectomics as "an example application"; it is unclear to the reader why, at this point, the models are relevant to stated goal of defining means in connectomics. When submitted to NeuroImage, the discussion should be framed in the context of brain.
Neuro changes. A lot of these changes will be in the intro.
The word "inadmissible" is used in its statistical sense without being defined. It is a common-English word, and hence its meaning here should really be made explicit.
We define "inadmissible" before introducing the Stein's example.
The title is not very related to the work presented. I would strongly argue that it should be changed to something more descriptive about the work.
We change the title to "Connectome Smoothing and a Law of Large Graphs" to better describe our work.
One a big picture standpoint, the tools presented in this manuscript are useful only if there are very few subjects (between 5 and 10). When there are more, they can be detrimental (as seen from figure 4, where for JHU and Desikan atlases they do not improve upon the naive estimator. This caveat should really be discussed: what application problem do they solve? 10 subjects is below the typical study. Right now this aspect feels hidden under the rug in claims of general usefulness.
We show the results when the sample size M=50. And the "low-rank" estimates are still not that bad. Also, this method can be applied to subgroups, such as all Females, between the age of 21 and 25, to better explore differences between groups.
The asymptotics presented in section 4.1 are for number of nodes going to infinity with a fixed number of subjects. This are very much low-sample asymptotics. While I understand their interest for proofs, it is not clear to me that they relate well to application settings. I believe that this aspect should be discussed. In particular given that the good performance on the low-rank model is somewhat created by this specific choice of asymptotic regime.
We anticipate the collection of larger and larger brain network which will also likely initially correspond to smaller sample sizes as the technology to scale these connectome collection techniques is developed.
In section 4.2, if I understand things right, the simulations are done informing the algorithm on the actual rank of the adjacency matrix. This does not reflect well the application situation. I think that simulations that include the automatic choice of the dimensionality should be included. In general, I find that the simulation settings are overly favorable to your model: it is a tautology that the estimator proposed will work well on the simulation, as the matrix is actually low rank. I would suggest that the authors should also do simulations on data that slightly break the model, adding non low-rank aspect to the graph, to probe when the estimator breaks down.
The simulations in Section 4.2 are based on a toy model setting to better illustrate the theory and how it works in the finite sample, for an idealized setting. We have a more realistic simulation in Section 4.4 based on full rank independent edge model, which breaks the low-rank assumptions of our theory. And our estimator still shows its advantage even in this situation.
Remark 4.4 is an important one: it tells us that if the graph isn't well approximated as a low-rank matrix, the estimator will not perform well. I believe that it should be discussed in the discussion part, and maybe put in perspective with the fact that low-rank approximations seem to work well with brain connectivity graphs.
As referee pointed out, the estimator will not perform well if the graph doesn't have the low-rank property. We add a discussion about this in Remark 4.4. The worst case, i.e. all eigenvalues are almost equal, leads to a certain type of structure. We also add a scree plot (Figure 5) for the mean graph based on all 454, and a histogram of those eigenvalues (Figure 6) to show that there is actually a quasi low-rank structure.
Some important information is given in section 6, which is the last section (maybe it is meant to be an appendix?). Reading the manuscript was awkward as it wasn't clear to me that some information was given later. For instance, around line 430, the descriptions of the methods to select the rank are named, but no reference is given to the part in section 6 that describes them. Similarly, paragraph 6.3 should be clearly referenced early in the text.
Thanks for the suggestions. We move some parts of Section 6 to the earlier part of the paper, e.g. the definition of MSE and RE, to help better illustrate the idea. Also in relevant paragraphs, we mention specifically that certain information could be found in Section 6 in more details.
The first column on figure 4 is a bit surprising: computing a mean with a single subject. It would be useful to either discuss why it is relevant, or to remove it.
The low-rank estimator reduces the variance by taking advantages of inherent low-rank structure of the mean graph. Such smoothing effect is especially obvious while we only have 1 observation. When M = 1, all weights of the graph are either 0 or 525 1, leading to a very bumpy estimate. In this case, P smooths the connectomes estimate and improves the performance. (Discussed in Line 525)
Dict-learning, ICA, network of network (Abraham, biobank, calhoun)
Maybe find these papers and cite them?
Line 18: I think that the word "sample" should be plural.
Revised.
As a nitpick on wording on line 26, I would rather say "a bias-variance trade-off", rather than "the bias-variance trade-off". To me their are many different tradeoffs, as there are many different kind of biases. The bias that the authors are taking here is a useful one, and grounds the success of the work.
Revised.
I was surprised on line 177 to read about positive semidefinite matrices while line 102 mentions that the diagonal of the matrices are zero. To me they cannot be positive semidefinite without be zero (a simple proof would be that the trace is invariant to the basis; considering the basis in which the matrix is diagonal, the trace is the sum of the eigenvalue. If the diagonal is zero, this sum is hence zero. As or the eigenvalues are positive, they should all be zeros). I suspect that this problem is related to the comment on line 240 and to section 6.2. However, I must say that I found that it didn't make reading the manuscript easy.
We mentioned in Line 283 that the zero diagonal leads to a missing data problem. To remedy this issue, we use the diagonal augmentation procedure discussed in Section 6.2. After the diagonal augmentation, the diagonals are no longer all zeros.
On figure 4, the legend says literally "A bar" and "P hat". I must confess that reading this quickly, I did not immediately make the connection with the symbols.
Revised.
Figure 5 should probably also show the subtraction of A bar and P hat, as the difference is very hard to see.
We added the figure showing the difference between A bar and P hat in Figure 8 (originally Figure 5).
In this paper, Tang and colleagues proposed a low-rank method to estimate the mean of a collection of undirected and unweighted graphs. Under a semi-positive definite stochastic block model, or more generally a random dot product graph model, the authors derived analytic results showing that the proposed method is guaranteed to have better performance than a simple element-wise average of the observed adjacency matrices when the network is large and sample size is small. Using simulations, they confirmed that this is also true when the data is generated from an underlying independent edge model.
Overall this paper is clearly written, easy to follow and the proposed method looks sound to my assessment. However, I have two major concerns about this paper.
(1) I think this paper doesn't fit very well into the scope of NeuroImage. The work is completely methodologically oriented, has a strong statistical taste and was motivated by and grounded on the general graph theory. The introduction did not refer to any neuroimaging studies. Although the paper included a real DTI data set, and it was only used to synthesize data and validate the proposed method. To publish this paper in NeuroImage, I think the authors need to build a stronger connection between the method and the current neuroimaging literature, demonstrate its potential usefulness in addressing important scientific questions in brain network studies and provide more interesting real data applications. A statistical journal might be a more appropriate target for the current form of this paper.
Neuro changes.
(2) It seems from figures 4 and 8 that the proposed method outperforms the element-wise average only when the sample size is very small (N<10). Beyond this sample size, the two methods produce indistinguishable results or the simple average approach can be better. I think this considerably weakens the usefulness of the method in real data analysis. Currently in the neuroimaging world, most population based brain network studies have a sample size much larger than N=10. The proposed method could still be useful when the mean graph of a small subgroup of the subjects needs to be estimated but this paper did not provide an example. In the Discussion section, it was mentioned that the low rank representation can be used to improve interpretability of the data and provide biological insight. Unfortunately, this wasn't demonstrated in a real data set either.
We show the results when the sample size M=50. And the "low-rank" estimates are still not that bad. Also, this method can be applied to subgroups, such as all Females, between the age of 21 and 25, to better explore differences between groups.
Also, we add Figure 11 to show the low-rank representation. we can see a clear distinction of 659 the left and right hemisphere as conveyed in the second dimension. Additionally, such a representation allows the use of techniques from multivariate analysis to further study the estimated population mean.
Some minor comments:
— perhaps use a more informative title? — line 223: what do \tilde{U} and \tilde{S} and represent? — line 224: largest eigenvalues of A —> largest eigenvalues of \bar{A} — line 300: \rho_2 = 1-\rho_2 —> \rho_2 = 1-\rho_1 — line 304: the relative efficiency should be N/(N-1)?
Revised.
Summary: The Stochastic Blockmodels (SBM) and the Random Dot Product Graph (RDPG) represent two prominent classes of models that frequently appear in network clustering and dimension reduction problems. Both models assume that the observed edges are generated according to some unknown random graph distribution whose first moment (the population Mean Graph) is structurally well approximated by the cluster structure found by the SBM and RDPG. Previous work on the RDPG models suggested a new class of estimators of population Mean Graph that are based on low-rank approximation methods. This paper considers the same class of estimators but in the context of an SBM which is assumed to have a positive semidefinite probability connectivity matrix so that it could be expressed as the RDPG model. This low-rank estimator is contrasted against the classical maximum likelihood estimator (MLE) for the SBM showing some improvements over the classical MLE particularly for large networks.
This paper touches on an important topic and certainly offers some interesting new perspectives which can be useful to the neuroimaging community. However, I fear that the paper needs to go through substantial modifications (that are in reality beyond major revision) in order to merit publication in NeuroImage. Nevertheless, I will ask for major revision but I expect a lot of effort and massive improvement in the second round.
The presentation and writing skills need to be radically improved. The authors should carefully proof-read the material before submission and have a sanity check if the current manuscript conforms to the journals' standard. Several things come to mind as potential remedies.
Consider your audience. Publications from this journal are read by researchers with diverse backgrounds including machine-learning, neuroscience, psychology, psychiatry, statistics, engineering and physics. Thus, in order to maximise your impact you really need to write things in a simple manner whenever possible. In particular, your introduction should read in a such way that is understandable to researchers that have never heard of the Stochastic Blockmodel (SBM) or Random Dot Product Graph model. Hence, start from defining the problem in general terms and mention methods known to your audience. See, for example, the paper by Rubinov and Sporns [1] to give you a flavor of what this audience is aware of and then try to put your work in this context. In a nutshell, your introduction should clearly state:
- what is the problem (e.g., network clustering) and some examples of data and methods
- define the models that you will consider (there are so many papers on SBM it is a common courtesy to cite them) and comment their use in neuroimaging data [2, 3, 4, 5, 6, 7, 8, 9]. Please also note that "community structure" is a term that these days is identified as "modular organisation" and as the SBM can identify more diverse structures than that, I would suggest that you simply refer to at as "cluster structure" and thus define this in a more general way.
- what is that you are proposing
- how different/novel this work is to the work of previous authors, including [10] and [11]
- why your work is important to this audience and what they can hope to gain. For example, if you are claiming that your estimator has an advantage in large networks with a small number of realisations, then it would be great if you could give a practical example of such data. Another example, if somebody is interested in logistic regression model and hypothesis testing how your estimator changes the game there?
Satisfy the readers' expectation. As a reader I was getting really irritated with some sections because the ground would be prepared for one point and then something completely different would be stated next. For example, just before your Algorithm 1 you are talking about A ̄ and A(m), so I am expecting something that is involving these objects and then I learn that in fact you define some general A that is a symmetric real matrix that has never been mentioned before as the main input of your algorithm. Thus, if you are changing gears you need to reflect that in your text beforehand. Your figure captions need to clearly define all the components of your plot before you proceed to comment on the most striking results. For example, in Figure 2, B11 and B12 are not clearly defined (also at best your notation was B_{11}) and thus if you set specific values beforehand for connection probabilities then you should tell us about it. The same point can be made for Figure 1, where you do not even give colour bars with probability values. Overall, if you have more than one plot in your figure it is expected that you label subplots with A,B, etc., and state what each plot represents (this is much easier than bottom left or top left approach). Your individual plots should be of the same size and also combined in such a way that a figure is of reasonable size and viewing quality. Axis titles should start with capital letter and the same applies for plot subtitles (e.g., Fig. 1 "rank-5 approximation" − > "Rank-5 approximation")
We change the way of introducing the algorithms to make it more convenient for the readers. And we define all components of the plot (e.g. parameters setting B) and refine the figures in many aspects like size, axis, color bars, etc. I didn't add the parameter matrix for Figure 2 since 5x5 block B matrix seems too messy and it doesn't help illustrating our point. Currently, we haven't added labels to our subplots, but we are willing to revise it if necessary.
Typos/Grammar. You need to make sure that your text is free from typos. For example, "stochastic block model" appears in the title of section 2.3 while everywhere else you use the "stochastic blockmodel". When you define dot product I really do not see the reason for the use of < · > notation, but if you are really bent on this notation at least define it in a correct way < x, y > should not be a sum over i. In the spirit of the readers' expectation, if you are already using < xi,xj > then it would be easier to define dot product in terms of these variables than to introduce y which is redundant. When you state $B{\tau{i},\tau{j}}=\nu{\tau{i}}^{\top}\nu{\tau{j}}^{\phantom{\top}}$ you are giving an impression that $\nu{\tau{i}}$ is a row vector? (Also, you may want to investigate \phantom command as a quick way to align indices when you are using \top for transpose, so \phantom{\top} can help you there). In Section 2.2, "... the entries A{ij} are distributed independently as Bern(< X_i,Xj >) for ..." you really need to say Bernoulli trials rather than to use abbreviation if you want to be consistent with Section 4.2 "...$\tau{i}$ are drawn iid from a multinomial distribution ...". Note that this list of examples is not exhaustive, there are plenty of other instances when things are not defined but are used or it is assumed that the reader knows your notation very well.
As suggested, we fix the typos and revised a lot of notations such as unify the dot product notation.
Citation. Please pay attention how you are citing. "Hoff et. al. (2002)" is incorrect. Also when you are citing two or more papers in parenthesis you should really separate them by semicolon rather than comma. Finally, you need to be very careful when you are describing your own original work and when you are using results of other people (i.e. proposing to use is very different to we propose).
Revised.
Notation. Please do not abuse statistical notation! Capital Roman letters indicate random quantities and small Roman letters indicate their realisations. Saying stuff like P(A{ij} = a{ij}) makes sense. However, this P(A_{ij}) means nothing. It is important early on to define basic statistical notation. Note that you can use bold face for non-scalar quantity. I cannot stress enough how important it is to distinguish between these quantities especially when you are taking expectations. I should be comfortable when I read your text and not wrack my brains out trying to see if an object is random or non-random. Finally, it is important to establish a unifying notation that is your own. So far, my overall impression is that your notation is way too eclectic.
We revised some of them.
Objectiveness. Please do not misuse adjectives when you are comparing things. For example, you say "... in this instance visual inspection demonstrate that $\hat{P}$ performs significantly better than $\hat{A}$". This suggests that you are using a statistical hypothesis test of some sort and that you have, for example, p-values that can back this up. This is not a correct expression for what are you trying to say. Please try to be more realistic/careful with this.
Revised.
Structure. Please also consider re-structuring your work. When I am reading about the SBM I should get all the information I need to follow the later developments. So, I believe that in asymptotic results I should already be familiar with the distribution of latent variables and that distribution of edges is conditional on the cluster labels. This should be defined in the section on the SBM. Again, there are other places where you could have structured things in a more reader-friendly way. For example, look at some NeuroImage papers and see how they structure the simulation methods and results. You can do something like that too. Also, NeuroImage allows only sections and subsections, so I really think that you should use these wisely.
My general impression of this work is that it is very sloppy. There is some interesting material, but it will take serious efforts and work to get this into a good shape. I personally expect every NeuroImage paper to be self-contained. If you are using some method then this needs to be defined. Although I do not expect this to appear in the main text, all background material should be easily accessible and contained in either the appendix or in the supplementary material of this manuscript. This should be obviously summarised in your own words and with your own notation.
Define MSE before you define Relative Efficiency (RE) "Somewhat surprisingly, the asymptotic relative efficiency will not depend on this fixed sample size M" You are jumping ahead of yourself. I am expecting from you to interpret RE here and not state your main result. Please, explain what values RE can take and what we learn from that.
We move the definition of MSE and RE from Section 6 to the beginning of Section 4.1, and interpret these concepts, to help better illustrate the idea.
A multinomial distribution takes the total number of trials as a parameter and this needs to be stated. I think it is better for you to state here a "Categorical distribution".
Revised.
Block membership probabilities are defined as a vector of length K, so I am confused what is then $\rho{\tau{i}}$ ? Is this then a vector of length N? Note that in the appendix you state $\sum{i}\rho{i}=1$.
$\rho{\tau{i}}$ is a scalar, representing the probability that a vertex is assigned to block $\tau_i$.
Please explain |{i :
i = k}| that it means the total number of nodes in block k. Explain why this estimate converges to k as N increases. Are you using Maximum Likelihood theory here?
$\tau_i$ is the block membership of vertex i. And the convergence is because of the law of large numbers.
Your comment that the distribution of edges is conditional on the latent variables should have been already discussed when you first time talk about the SBM. For Lemma 4.1 in Section 6.5, you need to elaborate this proof and its derivation and clearly state every single result you used from Athreya et al. [10].
The conditional distribution of edges is discussed in Section 2.2 when we write down the likelihood function for RDPG. We reference the Avanti's results in the Appendix.
When you say "From this we can see that the MSE of $\hat{P}_ij$ is of order ...." Ok and this means what?
We add a discussion on the order. First it means the estimate will get better as the number of observations M increases. Furthermore, it also benefits from a larger graph because of the use of low-rank structure. That is, P will perform better as the number of vertices of the graph N increases.
Just before Theorem 4.2 you say "This yields the following result." Try to give more explanation what you mean by this. Theorem 4.2, maybe it would be easier to state first your RE and then its asymptotic RE. Its proof should be clearer. It is ok to repeat proofs and add more details to aid overall clarity.
We add the explanations before Theorem 4.2 and change the order of both Lemma 4.1 and Theorem 4.2.
The first paragraph after the Theorem 4.2 should be massaged into the text where you mention the RE for the first time. The second paragraph after the Theorem 4.2 is this what you wanna say
2 = 1− 2?
Revised.
Finally, when you talk about effective minimum you need to be aware that the SBM can give you empty blocks and that these degenerate solutions can happen in the optimisation procedure where the model simply hones onto the solution with a smaller number of blocks then what was initially considered.
Fig. 2 what is B set to? How you are ranging your block sizes? These are crucial points in your simulation and it should be one of the first things you state and then you can comment on your results. Please write a separate section that details your simulation setup and then a section that discusses your results.
Fig. 2 is now Fig. 3 in the current draft. We added the parameter setting as in Eqn. 2. Do we need a seperate section for the simulation setup?
The simulation setup seems way too basic. It would be interesting to see different values of B especially those whose values are on the boundary of the parameter space (close to 0 or 1). Also, it would be interesting to see different values of K. It seems way too simple to have only K = 2. What happens for K ∈ {5, 10, 15, 20}) and when the block sizes are different relative to each other?
The simulations in Section 4.2 are based on a toy model setting to better illustrate the theory and how it works in the finite sample, for an idealized setting.
Please state the range of vertices that you consider?
Added. N ∈ {30, 50, 100, 250, 500, 1000}.
REst, this you need to re-define. It does look like a strong deviation from your previous notation. Again your simulations need to be structured in such a way so that the methodology of your simulation is a separate section and clearly defined. I should also be able to find all of your parameters straight away and not see that M=100 in the caption of Fig. 3. Also, it would be clearer to see what indices you are summing over if you use three sums instead of one sum.
Do we need a seperate section for the simulation setup? I like the current notation. Any suggestion to revise it?
"In Fig. 3, We..." typo capital letter for we.
Revised.
"...significantly less than 1..." please do not use this adjective without p-values (or something alike) and the explanation how do you get such p-values
Revised.
In your cross-validation strategy it seems that some Monte Carlo samples will be repeated as there is only 454 scans and when M =1 this can happen. Is this a problem? Could you please comment more on this?
For M = 1, we go over all possible 454 samples (each sample is one of the 454 graphs) instead of 1000 simulations.
I am not sure how do you fit the SBM here? What are the estimation strategies employed here to find the cluster labels? How do you resolve the influence of starting points in your data analysis.
Although fitting the SBM and finding the cluster labels could be an interesting future direction based on our current work, we don't focus on these tasks in this paper. Instead, we let our estimator discover the low-rank structure itself and estimate the probability matrix P directly.
Please explain the dimension estimation procedure in more detail in an appropriate part of manuscript. Your comments (pg. 15 paragraphs 1-3) are not referencing any of the figures and I am not sure if you are commenting some specific results or you are giving us some idea of what are your expectations of the simulations. Please be clear.
We mentioned we are discussing Fig. 4 (now Fig. 7).
While it might be trivial, I am not sure how the CI of RE is computed. Could the authors clarify this? what is now m?
We calculate it by assuming a normal distribution.
Fig. 5 is too small. How do you conclude that $\hat{P}$ is a better estimate of the true probability matrix than A ̄? Also what is K in this data; it looks like it is 2?
Do we want to make Fig. 5 (now Fig. 1) larger? Previously I used the "image" command and "levelplot" command, but it seems that even with the same color scale the two commands have different displaying. Now I use "levelplot" for all of them, and the three figures seem to be quite similar.
I am sorry but I really do not understand your simulation setup. Could you please explain this.
[1] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. Neuroimage, 52(3):1059-1069, 2010. [2] Christophe Ambroise and Catherine Matias. New consistent and asymptotically normal parameter estimates for random-graph mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1):3-35, 2012. [3] Patrick J Wolfe and Sofia C Olhede. Nonparametric graphon estimation. arXiv preprint arXiv:1309.5936, 2013. [4] David S Choi, Patrick J Wolfe, and Edoardo M Airoldi. Stochastic blockmodels with a growing number of classes. Biometrika, page asr053, 2012. [5] Franck Picard, Vincent Miele, Jean-Jacques Daudin, Ludovic Cottret, and St ́ephane Robin. Deciphering the connectivity structure of biological networks using mixnet. BMC bioinformatics, 10(6):1, 2009. [6] Hugo Zanghi, Christophe Ambroise, and Vincent Miele. Fast online graph clustering via erd ̋os-r ́enyi mixture. Pattern Recognition, 41(12):3592-3599, 2008. [7] Hugo Zanghi, Stevenn Volant, and Christophe Ambroise. Clustering based on random graph model embedding vertex features. Pattern Recognition Letters, 31(9):830-836, 2010. [8] Dragana M Pavlovic, Petra E V ́ertes, Edward T Bullmore, William R Schafer, and Thomas E Nichols. Stochastic blockmodeling of the modules and core of the caenorhabditis elegans connectome. PloS one, 9(7):e97584, 2014. [9] J-J Daudin, Franck Picard, and St ́ephane Robin. A mixture model for random graphs. Statistics and computing, 18(2):173-183, 2008. [10] Avanti Athreya, Carey E Priebe, Minh Tang, Vince Lyzinski, David J Marchette, and Daniel L Sussman. A limit theorem for scaled eigenvectors of random dot product graphs. Sankhya A, 78(1):1-18, 2016. [11] Sourav Chatterjee et al. Matrix estimation by universal singular value thresholding. The Annals of Statistics, 43(1):177-214, 2015.
We thank the NeuroImage for the opportunity to revise our paper, and the Handling Editor, the editorial team, and three referees for their thoughtful comments and suggestions. We have revised our manuscript in accordance with the review, and believe that this version successfully addresses all issues and is a better paper thanks to this revision process.