Closed dhimmel closed 4 years ago
To get the ball rolling here is a list of papers that describe how two nodes are related via similarity score (cosine or some variant)
Paper Citation | Type of Network | Methodology | How two nodes are related |
---|---|---|---|
10.1093/bioinformatics/bty559 | Phenotype-Disease-Gene network | Node2vec Like approach | Finds Disease-Gene Associations (Similarity Score) |
10.1109/BIBM47256.2019.8983134 | Disease-Gene-Chemical Network | Ensemble approach (Node2vec, Matrix Factorization, AutoEncoder) | Finds Disease Gene Associations (Similarity Score) |
10.1186/s12920-019-0627-z | Gene - Phenotype Network | Multi-path Embeddings (Extends metapaths into using multiple paths) | Gene - Phenotype (Similarity Score) |
10.1016/j.websem.2017.06.002 | Conglomerate of Node Types (Drug, Chemical, Protein) | TransH plus HolE to generate features for logistics regression | Predicts Drug Drug Interactions |
10.1093/bioinformatics/btx160 | Drug-Target-Disease Network | DeepWalk | Drug-Target Associations (Similarity Score) |
10.1093/jamia/ocy117 | Disease-Symptom-Protein Network | Node2Vec | Symptom - Gene associations (Similarity Score) |
1710.05980 | Patient-Disease-Drug Network | TransH like approach | Patient - Drug recommendations with low side effects |
1909.00672 | EHR derived network | PrTransH | Link Prediction from disease to Various EHR related Nodes |
10.1145/3184558.3186978 | Movie Rating network, Phenotype-Gene network | Metapath recommendations | Most relavant metapaths given two nodes |
The last one is metapaths rather than nodes, but this coves many studies.
One question to think about is how many different node and edge types are needed for a network to be considered heterogeneous? If it is greater than two then some of these sources will have to be removed as they only use two node types with three edge categories.
To get the ball rolling here is a list of papers that describe how two nodes are related via similarity score
Do these studies output how two nodes are related or whether two nodes are related? Whether could be something like ranked node pairs or a probability of being related. How is harder and requires an explanation for each prediction.
One question to think about is how many different node and edge types are needed for a network to be considered heterogeneous? If it is greater than two then some of these sources will have to be removed as they only use two node types with three edge categories.
Anything more than one node type and one edge type is heterogeneous. Perhaps more important than the network the study applied the approach to is the theoretical limits of the approach. If the approach cannot scale to many node and edge types, then that is a major limitation. Many approaches are hardcoded to a specific metagraph. I imagine it is sometimes challenging to know whether a method can scale and the best evidence is the metagraphs of the networks it has been applied to.
Do these studies output how two nodes are related or whether two nodes are related? Whether could be something like ranked node pairs or a probability of being related. How is harder and requires an explanation for each prediction.
Oh shoot. I was under the impression you were looking for papers that stated whether two nodes are related. Will be back on the hunt. Still might be worth mentioning some of the above papers though.
I was under the impression you were looking for papers that stated whether two nodes are related.
These are still interesting as examples of how connectivity search achieves something the prior works could not. So good to compile them and keep an eye out for whether they can explain how two nodes are related. Also good to keep an eye out for whether they require a training set (or gold standard) of known relationships.
Okay round two. Here are papers that should be more along the lines of explaining how to nodes are related. The majority of these papers are focused on social media and general knowledge rather than biology.
Paper Citation | Brief Summary |
---|---|
@doi:10.1145/3132847.3133161 | Goal is to explain why two entities within a tweet co-occur. Find intermediate nodes between the above query and then uses an SVM to rank intermediate entities that best explains why two entites co-occur. |
@arxiv:1809.07685 | A survey paper on explaining entity relatedness. Good reference for better understand on how to entities may be related. |
@doi:10.1145/3289600.3290990 | This tool uses network embeddings to explain users' actions and items on their social feed. The project the problem as trying to explain why two nodes on a graph are connected. Uses Learn to Rank algorithm to rank paths. |
@doi:10.1145/2983323.2983778 | This approach finds subgraphs that are most relevant to query entity sets. By finding subgraphs the authors are able to explain some connections between the sets in question. |
@doi:10.1145/2872518.2890528 | This approach is strikely similar to the above approach. Might be a same paper issue. Anyway this paper is designed to find subgraphs that best explain how entity sets are related. |
@raw:RECAP | The goal of this is to explain how two entiteis are related. It aims to find most relavant edges between two nodes via the following algorithm steps: find paths, rank the paths, select top X paths. |
@danich1, fantastic research! It is challenging to discover these studies since they're from a different field of study. A rare study cites across domains... and it is a sign of rigor.
Do all of these studies apply to hetnets? Or are some restricted to simple or bipartite graphs? It can be hard to tell sometimes.
For @raw:RECAP
it has a DOI we could use for citation: https://doi.org/10.1007/978-3-319-25007-6_36
I think a paragraph that cites these studies would be extremely helpful for our introduction. It is okay to group multiple studies in a single sentence of explanation if their approaches and designs are similar.
We also might want a paragraph on the "whether" two nodes are related, but with the clear caveat that these studies don't address the how.
@danich1 do you think you could take the lead on these paragraphs?
I think a paragraph that cites these studies would be extremely helpful for our introduction. It is okay to group multiple studies in a single sentence of explanation if their approaches and designs are similar.
We also might want a paragraph on the "whether" two nodes are related, but with the clear caveat that these studies don't address the how.
do you think you could take the lead on these paragraphs?
Sounds good to me!
Do all of these studies apply to hetnets? Or are some restricted to simple or bipartite graphs?
Good question. Will need to dive deeper into these papers to confirm; however, I'll make sure the manuscript text reflects the answer to these questions.
Currently we have a few mentions of related works in the introduction other works section.
The introduction currently describes Hetionet and Rephetio, but doesn't do a great job discussing other studies. It's hard to first figure out what studies are sufficiently related and worthy of mention, especially since terminology varies wildly between disciplines.
@danich1 offered to help, since he's more up to date on the latest literature here.
The top priority is identifying any pre-existing studies on detecting how two nodes are related in a hetnet. If those studies communicate themselves sufficiently and seem to be methodology sound, we should consider them for citation.
Ideally, we can group preexisting studies into sensible categories of how they relate to this work.
@danich1 do you want to take the lead here?