greenelab / xswap-manuscript

Manuscript on XSwap network permutation and hetnet node degrees
https://greenelab.github.io/xswap-manuscript/
Other
3 stars 5 forks source link

Address #54 substantial issues #57

Closed zietzm closed 1 year ago

zietzm commented 4 years ago

I am opening this PR so that I can comment on the manuscript with specific issues brought up in #54 and attempt to address them.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.15 for commit 1e401584d0b2bee10b31d495336e322c38c71f17 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.15-1e40158.pdf.

zietzm commented 4 years ago

Abstract

1.

Manuscript reads:

We introduce a network permutation framework to quantify the effects of node degree on edge prediction.

Commentary:

What method?

Response: The goal is for the framework to be applicable to many edge prediction methods. For one, any method that assigns a value to node pairs (for example, a feature itself or the output of an algorithm that takes many features as inputs and outputs scores for each node pair) can be analyzed in this way.

For example, Figure 8 in the current manuscript (pasted below) does this quantification for five features. This could just as easily be done with the probability of an edge existing from a trained logistic regression (or any ML method) that took these features as inputs. image

2.

Manuscript reads:

Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree.

Commentary:

this feels like it might be specific to one type of edge prediction algorithm. I'm wary of oversell.

Response: I don't believe it should be specific to one type of edge prediction algorithm. These five features could all be considered different edge prediction algorithms. A logistic regression trained with some combination of these (or other) features is a different edge prediction method. Even more methods would be the same (or other) features with a different method for making predictions from features. Ultimately, any method from which an AUROC can be computed can be analyzed in this way.

zietzm commented 4 years ago

Introduction

1.

Manuscript shows the following image: image

Commentary:

Why are we using violin plots here? Degree distributions typically shown as image Problem: each of these really has a different # of nodes!

Response: My goal for this figure is to illustrate the point that real (biomedical) networks can have a variety of different degree distributions. The figure was created simply to reinforce this qualitative point, not to provide any quantitative insight into the specific degree distributions shown.

As to the style of the figure, I had considered either of the following ways of showing the distributions:

When I was deciding on how to make this figure, I thought that any of the above (handwritten figure styles) were more cluttered and not as helpful for visualizing the point that distributions are diverse. Moreover, I think the top figure (labelled "a") is confusing because it puts the distributions onto one another, which made it harder to identify which network corresponded to which distribution, while the bottom (labelled "b") made it difficult to visualize many distributions without making them individually very small or the overall figure very large.

I am open to suggestions for how best to illustrate the point I am trying to make. If this figure is more confusing, misleading, or unnecessary than it is helpful, I could also just remove it entirely.

2.

Manuscript reads:

Firstly, bias in networks can distort node degree so that degree differences between two nodes may not be meaningful. Secondly, reliance on degree can lead edge prediction methods to make nonspecific or trivial predictions and fail to identify novel or insightful relationships.

Commentary:

Can you support these claims?

Response: In the introduction, I'm trying to motivate the problems of degree with 1). degree bias in real networks, and 2). the effect of degree imbalance on degree-associated methods ability to make specific predictions.

I think the first point is a logical consequence of biases in degree distributions, which have been demonstrated in previous works that this paper cites. I believe that the following figure (found in Results) also illustrates this point:

The second point is harder to cite. In some sense, it is just a logical statement, "Methods correlated with degree will give high value to high-degree nodes, largely irrespective of their true connectivity." I'm using the example of multifunctionality because it was one of the only previously published works that I found that addresses this point.

3. Concluding commentary

Overall, this intro feels like it's trying too hard to explain/throw shade on existing techniques. This would be much stronger as more of a lit review w/ pointers to specific settings where bias leads to problems.

Response: I certainly don't intend to insult any other methods and would be eager to change any specific wordings that might give this impression.

My intention in the introduction is to introduce the concepts of degree bias and prediction non-specificity to motivate why we want to account for degree. I try to explain degree bias using the example degree distributions and the discussion of inspection bias. Non-specificity is discussed in reference to misleading causality, "[M]any predictions appear to rely primarily on multifunctionality and could be 'potentially misleading with respect to causality.'" With respect to node degree, this point has not been (to the best of my knowledge) greatly explored in previous works.

I would be very eager for any suggestions to change the introduction if it could be presented more clearly, concisely, or if the problem could be motivated using a better approach.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.16 for commit 3dd3580ec703ea6ed5df991d81a9cc910c10d4c5 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.16-3dd3580.pdf.

zietzm commented 4 years ago

Methods

1.

Manuscript reads:

We provide documentation for parameter choices depending on the type of network being permuted in the GitHub repository (https://github.com/hetio/xswap).

Commentary:

why not a citation?

Response: I am indifferent as to whether we use a citation or in-text URL here. I assume that a journal will have its own preferences for URL vs citation.

2.

Commentary: image

Response: I used a style for the modified algorithm that I thought was most clear. I replicated the original XSwap algorithm faithfully from its original publication.

As it appeared originally: image

As it stands, the original algorithm is faithfully represented from its original publication. I am happy to modify the original algorithm's pseudocode to match my own if that is preferred.

3.

Manuscript reads:

Applications of the modified XSwap algorithm to various network types with appropriate parameter choices.

Commentary:

this needs a name

Response: Should we refer to the original XSwap algorithm as the, "XSwap algorithm," and refer to the, "modified XSwap algorithm," by a new name we have yet to create? I am not opposed to coming up with a name and would by glad to have suggestions from anyone on this.

4.

Manuscript reads:

The edge prior can be estimated using the fraction of permuted networks in which a given edge exists—the maximum likelihood estimate for the binomial distribution success probability.

Commentary:

of?

Response: I tried to replicate the terminology used on wikipedia. I am happy to rephrase this if the parameter in question is more commonly referred to by a different name or if it would be more clear if phrased differently.

5.

Manuscript reads:

Nonetheless, we discovered a good analytical approximation to the edge prior for networks with many nodes and relatively low edge density (Figure 3).

Updated to read:

Nonetheless, we discovered a good analytical approximation to the edge prior that is particularly good for networks with many nodes and fewer edges [...]

Commentary:

how many? how big is too big?

Response: This is hard to quantify, because even for networks with quite few nodes (Disease-localized-Anatomy in the figure), the approximation is pretty good. There are not any kinds of generic hard cut-offs. The more nodes and the sparser the network, the better the approximation.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.17 for commit 27831eedd4cef1a6ae6dce573eecd02db36dadd1 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.17-27831ee.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.18 for commit 78aee0c07ad7f8a909af73958f2621ec7fdd0978 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.18-78aee0c.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.19 for commit c4030ddf602e36929e01004021190a167c6221ee by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.19-c4030dd.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.20 for commit c64302c6bb162d9644a36493ca787feb3e24f1ec by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.20-c64302c.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.21 for commit 6c0d96fea85795b244ad7de59009822132eb7bc2 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.21-6c0d96f.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.22 for commit 0f1cbf09a54889c3e30012331079d413542b9f46 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.22-0f1cbf0.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.23 for commit e9d9b7c766782a60bb80221eb97f2efde9eedb7f by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.23-e9d9b7c.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.24 for commit cca59481b91d6a6e52eb8c6fea2c9ddd64f2d9e8 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.24-cca5948.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.25 for commit c58a4792886cbc4a20d864656d5210cb14df612b by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.25-c58a479.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.26 for commit 02e57eb64a37ff175ff0c077d7f34580aa4cac37 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.26-02e57eb.pdf.

zietzm commented 4 years ago

Results

1.

I'd like consistent short names for each "feature" used throughout, perhaps even in a different font

Response: Fully agree. I'm rerunning the calibration computation now to redo the calibration plot.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.27 for commit 5e7c1d344eb5b9be0f1825c6edae345fd1fd6317 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.27-5e7c1d3.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.28 for commit 203cf3b8efe8a8680a771d79b29896bf7fc5b5e7 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at manuscript-1.0.28-203cf3b.pdf.

AppVeyorBot commented 4 years ago

AppVeyor build 1.0.31 for commit 8e2eb02dcfa1c8b3696197c42d0aca7d063b7f02 by @zietzm is now complete. The rendered manuscript from this build is temporarily available for download at:

dhimmel commented 1 year ago

I will merge this since @cgreene approved and we can address anything remaining in subsequent PRs. Want to merge to avoid potential conflicts.