neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

support k-core-decomposition algorithm #217

Closed scarletfrank closed 2 years ago

scarletfrank commented 2 years ago

Is your feature request related to a problem? Please describe.

I find a archived issue about k-core-decomposition. The k -core decomposition is to find the largest subgraph of a network, in which each node has at least k neighbors in the subgraph. I think it measures the importance of each node within the k-core subgraph., so I make a feature request.

Describe the solution you would like

CALL gds.kcore.stream(
  graphName: String,
  configuration: Map
)
YIELD
  nodeId: Integer,
  value: Integer

Configuration

name type description
k INTEGER Filter the named graph using the given k

Describe alternatives you have considered

I will try to setup a pregel-api project and implement it myself. Maybe I could come up with some alternatives.

Additional context

scarletfrank commented 2 years ago

A common k-core implementation is removing the nodes which has a degree less than k from the original graph and reevaluate the degree again. And I read a post of the k-core variant which do not require node removing operation. It is similar to Pregel. A node which should be removed will send message to its neighbors.

But as I try to implement it in pregel-bootstrap. I find the docs saying A node can also send messages to other nodes, typically its neighbors, which are received in the next superstep

The message is processed in the next superstep, so the update on the node value is delayed. I could not evaluate the exact number of nodes (degree=k) in the masterCompute function, because of the delayed message. ( Like the vertex B). What should I do?

Here are some figures from the post.

start

end

Addtitional context

s1ck commented 2 years ago

Thanks @scarletfrank for the feature request. k-core decomposition is an interesting algorithm. I'm not sure if using the Pregel approach is the most efficient, though. I studied it once as part of a subgraph isomorphism algorithm and implemented an O(m) version where m is the number of edges in the graph. We can implement it in a very similar way using our native graph API in GDS.

We will discuss adding the algorithm to GDS in our planning for the 2.3 release, which will be roughly around end of this year. But feel free to try it out yourself using the Graph API.

However, if you want to continue trying this with Pregel, you could share the high level idea of how you want to implement it and I can help with the implementation details. I can imagine something like (without having actually implemented this, so maybe I'm missing something):

s1ck commented 2 years ago

@scarletfrank I will close the issue for now, since we won't add the algorithm immediately and transferred the request to our internal planning system. If you still have questions about Pregel, you can either reopen the issue or ask in the Neo4j community forum / developer discord.