Abstract revisions - Githubissues

dhimmel commented 5 years ago

This PR contains fairly major revisions to the abstract, which I think was already quite good, but I wanted to stress certain aspects that I think will have the most impact. Please do not accept suggestions you do not agree with... these are just suggestions, so happy to revert any sentences you think were better before or are discordant with the manuscript content.

zietzm commented 5 years ago

I have three points in mind that I think are contributions of the paper:

Edge prior gives baseline performance for a task (train/test networks)
Permutations let you quantify how well a feature captures degree and non-degree information
- Performance on permuted networks lets you see how well it captures degree, then performance on unpermuted network lets you see how well it captures degree + other stuff.
- For example, if feature performance on permuted training networks is lower than edge prior, the feature only somewhat captures degree. If its performance on the unpermuted network is then the same as the edge prior, you'd expect it to be more specific than a feature which has performance on both permuted and unpermuted networks equal to the edge prior. (Because the difference in performances shows it captures some amount of "other stuff")
By using the permutation framework we find that degree is not suitable for prediction tasks where training and testing networks have very different degree distributions.

These are all somewhat complex topics, so I'm not sure if they can/should all be tackled (or even introduced) in the abstract. Without this PR, we only get the first point. The abstract in this PR mentions points 1 and 3. What should we do about point 2?

dhimmel commented 5 years ago

Point 2 could be summarized:

Our framwork decomposes performance into the proportion attributable to degree and the proportion attributable to edge-dependent inference.

Furthermore, we can talk about how we can see how much of possible degree information (i.e. edge prior performance) a method captures by comparing the permuted AUROC. But I think this point is too technical and minor for abstract.

zietzm commented 5 years ago

I tried to merge the two versions of the abstract to pick out the strongest parts of each, while incorporating all three points.

Important tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Permuted networks can be used to quantify the relative performances of the degree and non-degree information that an edge prediction feature captures. Additionally, our permutation-derived edge prior quantifies the probability of a connection based only on node degree. This feature shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Feature performance exceeding the edge prior is attributable to factors other than degree and is often only a small portion of overall performance. In a comparison involving permutated networks and the permutation-derived edge prior, we demonstrate that degree's predictive performance diminishes when the networks used for training and testing were generated using distinct methods and hence have large differences in degree distribution. Accordingly, we suggest using the edge prior as a baseline for edge prediction and degree-preserving permutation to investigate a feature's ability to capture degree and non-degree information. We released our methods as an open-source Python package on GitHub (https://github.com/hetio/xswap/) and the Python Package Index (https://pypi.org/project/xswap/).

dhimmel commented 5 years ago

I tried to merge the two versions of the abstract to pick out the strongest parts of each, while incorporating all three points.

Can you commit that to this PR? I think that will make it easiest to give feedback?

greenelab / xswap-manuscript

Abstract revisions #42