Improve introduction sections on related works

dhimmel commented 4 years ago

Currently we have a few mentions of related works in the introduction other works section.

The introduction currently describes Hetionet and Rephetio, but doesn't do a great job discussing other studies. It's hard to first figure out what studies are sufficiently related and worthy of mention, especially since terminology varies wildly between disciplines.

@danich1 offered to help, since he's more up to date on the latest literature here.

The top priority is identifying any pre-existing studies on detecting how two nodes are related in a hetnet. If those studies communicate themselves sufficiently and seem to be methodology sound, we should consider them for citation.

Ideally, we can group preexisting studies into sensible categories of how they relate to this work.

@danich1 do you want to take the lead here?

danich1 commented 4 years ago

To get the ball rolling here is a list of papers that describe how two nodes are related via similarity score (cosine or some variant)

Paper Citation	Type of Network	Methodology	How two nodes are related
10.1093/bioinformatics/bty559	Phenotype-Disease-Gene network	Node2vec Like approach	Finds Disease-Gene Associations (Similarity Score)
10.1109/BIBM47256.2019.8983134	Disease-Gene-Chemical Network	Ensemble approach (Node2vec, Matrix Factorization, AutoEncoder)	Finds Disease Gene Associations (Similarity Score)
10.1186/s12920-019-0627-z	Gene - Phenotype Network	Multi-path Embeddings (Extends metapaths into using multiple paths)	Gene - Phenotype (Similarity Score)
10.1016/j.websem.2017.06.002	Conglomerate of Node Types (Drug, Chemical, Protein)	TransH plus HolE to generate features for logistics regression	Predicts Drug Drug Interactions
10.1093/bioinformatics/btx160	Drug-Target-Disease Network	DeepWalk	Drug-Target Associations (Similarity Score)
10.1093/jamia/ocy117	Disease-Symptom-Protein Network	Node2Vec	Symptom - Gene associations (Similarity Score)
1710.05980	Patient-Disease-Drug Network	TransH like approach	Patient - Drug recommendations with low side effects
1909.00672	EHR derived network	PrTransH	Link Prediction from disease to Various EHR related Nodes
10.1145/3184558.3186978	Movie Rating network, Phenotype-Gene network	Metapath recommendations	Most relavant metapaths given two nodes

The last one is metapaths rather than nodes, but this coves many studies.

One question to think about is how many different node and edge types are needed for a network to be considered heterogeneous? If it is greater than two then some of these sources will have to be removed as they only use two node types with three edge categories.

dhimmel commented 4 years ago

To get the ball rolling here is a list of papers that describe how two nodes are related via similarity score

Do these studies output how two nodes are related or whether two nodes are related? Whether could be something like ranked node pairs or a probability of being related. How is harder and requires an explanation for each prediction.

One question to think about is how many different node and edge types are needed for a network to be considered heterogeneous? If it is greater than two then some of these sources will have to be removed as they only use two node types with three edge categories.

Anything more than one node type and one edge type is heterogeneous. Perhaps more important than the network the study applied the approach to is the theoretical limits of the approach. If the approach cannot scale to many node and edge types, then that is a major limitation. Many approaches are hardcoded to a specific metagraph. I imagine it is sometimes challenging to know whether a method can scale and the best evidence is the metagraphs of the networks it has been applied to.

danich1 commented 4 years ago

Do these studies output how two nodes are related or whether two nodes are related? Whether could be something like ranked node pairs or a probability of being related. How is harder and requires an explanation for each prediction.

Oh shoot. I was under the impression you were looking for papers that stated whether two nodes are related. Will be back on the hunt. Still might be worth mentioning some of the above papers though.

dhimmel commented 4 years ago

I was under the impression you were looking for papers that stated whether two nodes are related.

These are still interesting as examples of how connectivity search achieves something the prior works could not. So good to compile them and keep an eye out for whether they can explain how two nodes are related. Also good to keep an eye out for whether they require a training set (or gold standard) of known relationships.

danich1 commented 4 years ago

Okay round two. Here are papers that should be more along the lines of explaining how to nodes are related. The majority of these papers are focused on social media and general knowledge rather than biology.

Paper Citation	Brief Summary
@doi:10.1145/3132847.3133161	Goal is to explain why two entities within a tweet co-occur. Find intermediate nodes between the above query and then uses an SVM to rank intermediate entities that best explains why two entites co-occur.
@arxiv:1809.07685	A survey paper on explaining entity relatedness. Good reference for better understand on how to entities may be related.
@doi:10.1145/3289600.3290990	This tool uses network embeddings to explain users' actions and items on their social feed. The project the problem as trying to explain why two nodes on a graph are connected. Uses Learn to Rank algorithm to rank paths.
@doi:10.1145/2983323.2983778	This approach finds subgraphs that are most relevant to query entity sets. By finding subgraphs the authors are able to explain some connections between the sets in question.
@doi:10.1145/2872518.2890528	This approach is strikely similar to the above approach. Might be a same paper issue. Anyway this paper is designed to find subgraphs that best explain how entity sets are related.
@raw:RECAP	The goal of this is to explain how two entiteis are related. It aims to find most relavant edges between two nodes via the following algorithm steps: find paths, rank the paths, select top X paths.

dhimmel commented 4 years ago

@danich1, fantastic research! It is challenging to discover these studies since they're from a different field of study. A rare study cites across domains... and it is a sign of rigor.

Do all of these studies apply to hetnets? Or are some restricted to simple or bipartite graphs? It can be hard to tell sometimes.

For @raw:RECAP it has a DOI we could use for citation: https://doi.org/10.1007/978-3-319-25007-6_36

I think a paragraph that cites these studies would be extremely helpful for our introduction. It is okay to group multiple studies in a single sentence of explanation if their approaches and designs are similar.

We also might want a paragraph on the "whether" two nodes are related, but with the clear caveat that these studies don't address the how.

@danich1 do you think you could take the lead on these paragraphs?

danich1 commented 4 years ago

I think a paragraph that cites these studies would be extremely helpful for our introduction. It is okay to group multiple studies in a single sentence of explanation if their approaches and designs are similar.

We also might want a paragraph on the "whether" two nodes are related, but with the clear caveat that these studies don't address the how.

do you think you could take the lead on these paragraphs?

Sounds good to me!

Do all of these studies apply to hetnets? Or are some restricted to simple or bipartite graphs?

Good question. Will need to dive deeper into these papers to confirm; however, I'll make sure the manuscript text reflects the answer to these questions.

greenelab / connectivity-search-manuscript

Improve introduction sections on related works #15