hetio / hetionet

Hetionet: an integrative network of disease
https://neo4j.het.io
257 stars 68 forks source link

Question About Hetionet's Dictionary #41

Closed chhyunch closed 3 years ago

chhyunch commented 3 years ago

We are graduate student researchers from Yonsei University who are fascinated by your team’s work, Hetionet.

We are currently working on knowledge graph in biomedical domain and we found your work of constructing an integrated network containing data from different sources very inspiring. Thanks for the great work. While we were studying Hetionet’s github(https://github.com/hetio/hetionet), we came up with several questions regarding the provided data set. We would be very grateful if you can help us go through this.

These are the questions we came up with:

  1. What is the difference between Hetmat-v1.0-permutations and hetionet-v1.0-perm-1~5? In the github, it was explained that 001 to 005 was constructed by project Rephito and network constructed afterwards were based on greenlab and hetmech. Would this be the only difference between the two?

  2. The following image shows loaded form of npz file for Hetnet006 CcSE.sparse.npz file. In the data, what is the difference between row and column? Is it safe to understand that row contains data of first entity type(this case, Compound) and column contains the second entity type(this case, Side effect)? Also, does the true & false indicates that pair(for example, [Compound 0 & Side Effect 2] information is contained in that particular hetnet set(this case, 006)? Compound 0 & Side Effect 2 is True in Hetnet 006 but false in hetnet 41. Is this because the pair’s information is only contained in Hetnet 006? image

image

  1. What is the difference between Hetnet 001 – 005 and Hetnet 006-200? Is there any differences in terms of data sources or formulation process etc?

Thank you for reading this bunch of questions. All these question came up in the process of studying Hetionet(Hemat) with fascination and deep interest.

Thanks.

dhimmel commented 3 years ago

Thanks for your interest! Let me know if I miss any of your qustions.

What is the difference between Hetmat-v1.0-permutations and hetionet-v1.0-perm-1~5

Each permuted hetnet is different. These are networks whose edges have been shuffled using the XSwap algorithm. Pasting the README that you mention:

hetionet-v1.0-permutations.zip contains 200 permuted hetnets derived from Hetionet v1.0 using the XSwap algorithm. Permutations 001 through 005 were generated by Project Rephetio Permutations 006 through 200 were generated as part of the greenelab/hetmech project.

So it's just that permutations 6-200 were generated by a different study. The method of generation (XSwap) is the same and the implementation of that method should also have been the same (via hetnetpy.permute.permute_graph). For more information, see the permuted hetnet section of the connectivity-search-manuscript.

what is the difference between row and column? Is it safe to understand that row contains data of first entity type(this case, Compound) and column contains the second entity type(this case, Side effect)?

Yes that is correct regarding the row indices corresponding to the source node and column indices corresponding to the target node.

The HetMat class from the hetmatpy package can read the unarchived hetionet-v1.0.hetmat directory.

Also, does the true & false indicates that pair(for example, [Compound 0 & Side Effect 2] information is contained in that particular hetnet set(this case, 006)? Compound 0 & Side Effect 2 is True in Hetnet 006 but false in hetnet 41. Is this because the pair’s information is only contained in Hetnet 006?

These are binary adjacency matrices. True indicates that an edge exists (in that permuted hetnet), whereas False indicates that an edge does not exist. So for "Compound 0 & Side Effect 2", yes that edge exists in permuted hetnet 6, but not 41.

What is the difference between Hetnet 001 – 005 and Hetnet 006-200

I think this was covered above. Note that these are permuted hetnets. Most users do not need the permuted hetnets, and instead just use the actual network. The permuted networks are for computing null metrics that take into account node degree but not edge specificity. More information in https://greenelab.github.io/xswap-manuscript/.

chhyunch commented 3 years ago

Thank you for your kind response.

We wanted to make sure we understood your answer correctly so we are sending some follow up questions. We are still kind of puzzled about the Hetnet data construction. Here are our questions:

  1. Is Hetnet 1-5 constructed from real data?
  2. Is Hetnet 6-200 constructed based on real data or is it a result of link prediction based on Hetnet 1-5?
  3. Hetnet 1-200 seems to have all identical row and col indices. Why does cell with same row and column indices contained in different Hetnet have different values? i.e. why is Compound 0 & Side Effect 2 True in Hetnet 006 but not in hetnet 41?

We would be really grateful if you can respond to us at your convenience.

Thank you.

dhimmel commented 3 years ago

There is only one real Hetionet v1.0. The permuted hetnets --- of which there were originally 5, but now 200 --- derive from the real Hetionet v1.0, but have been randomized. See the description at https://git.dhimmel.com/rephetio-manuscript/#permuted-hetnets

From Hetionet, we derived five permuted hetnets [201]. The permutations preserve node degree but eliminate edge specificity by employing an algorithm called XSwap to randomly swap edges [202]. To extend XSwap to hetnets [22], we permuted each metaedge separately, so that edges were only swapped with other edges of the same type. We adopted a Markov chain approach, whereby the first permuted hetnet was generated from Hetionet v1.0, the second permuted hetnet was generated from the first, and so on.

All users should start with Hetionet v.1.0 and not the permuted networks. Basically ignore everything in hetnet/permuted unless you understand what the permuted networks are and have a specific application for them.

But if you want real data with real edges, use the non-permuted network.