neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

Please provide community similarity algorithms #244

Open johnlinp opened 1 year ago

johnlinp commented 1 year ago

Is your feature request related to a problem? Please describe. I have a graph of social media data. I used community detection algorithms (e.g. Louvain) to detect different sets of communities, based on different properties, like location, timestamp, etc. Therefore, I have a set of communities that are detected based on the location of the data, and another set of communities that are detected based on the timestamp of the data.

My next step would be comparing the similarity between these sets of communities. I saw some algorithms like Rand Index will do the job. Can GDS provide such algorithms? Thank you.

Describe the solution you would like I wish GDS can provide community similarity algorithms, e.g. Rand Index.

Describe alternatives you have considered If GDS doesn't provide it, I'll have to implement on my own.

johnlinp commented 1 year ago

If anyone need a simple version of Rand Index implementation, here it is.

Assume that we are analyzing a set of social media posts (:Post). We have did 2 Louvain community detection based on 2 different attributes and put community_1_id and community_2_id on the nodes. The way to calculate the Rand Index between these 2 community sets will be:

CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id = m.community_1_id
  AND n.community_2_id = m.community_2_id
  RETURN count(*) AS a
}
CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id <> m.community_1_id
  AND n.community_2_id <> m.community_2_id
  RETURN count(*) AS b
}
CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id = m.community_1_id
  AND n.community_2_id <> m.community_2_id
  RETURN count(*) AS c
}
CALL {
  MATCH (n:Post)
  MATCH (m:Post)
  WHERE id(n) < id(m)
  AND n.community_1_id <> m.community_1_id
  AND n.community_2_id = m.community_2_id
  RETURN count(*) AS d
}
RETURN 1.0 * (a + b) / (a + b + c + d) AS rand_index;
gminneci commented 1 year ago

Hi @johnlinp! I am a product manager at Neo4j. Thank you for this feature request. We are looking at these type of features as 'subgraph similarity', but don't have an implementation plan just yet. Great to see that you have an implementation already - how is it working for you? Are there any specific limitations in what you are trying to achieve that you'd like to mention?