twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Graph Library for Scalding? #1577

Open richwhitjr opened 8 years ago

richwhitjr commented 8 years ago

Anyone interested in a Graph Library for Scalding? I have an internal project that for the most part is based on the api of GraphX in Spark(https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala). I was thinking of pulling the generic bits into scalding itself as a new project folder.

The general idea is to be able to work on vertices and edges and allow for really quick intersection of neighbors. I have been using it to do triangle counting and conductance measurement on the Twitter follow graph.

johnynek commented 8 years ago

Sounds interesting.

I wonder if a distributed version of this: https://oncue.github.io/quiver/

could be interesting.

It is based on work on functional APIs for graphs in haskell: http://web.engr.oregonstate.edu/~erwig/papers/InductiveGraphs_JFP01.pdf

Way back when I used a simple label propagation using HLL to count second followers and see the people who had the number of top 2nd follower (really unique 1st + 2nd). At the time, it was Obama, even though he was something like #5 in the top most followed.

On Wed, Jun 29, 2016 at 1:00 PM, Richard Whitcomb notifications@github.com wrote:

Anyone interested in a Graph Library for Scalding? I have an internal project that for the most part is based on the api of GraphX in Spark( https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala). I was thinking of pulling the generic bits into scalding itself as a new project folder.

The general idea is to be able to work on vertices and edges and allow for really quick intersection of neighbors. I have been using it to do triangle counting and conductance measurement on the Twitter follow graph.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twitter/scalding/issues/1577, or mute the thread https://github.com/notifications/unsubscribe/AAEJdpgVG-2wVwvp-TyQFc_SdY3oCRS8ks5qQvkCgaJpZM4JBpqX .

P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

richwhitjr commented 8 years ago

Yep my current thinking is to add an API at first to make these types of Graph walking jobs easier to reason about. Quiver looks interesting also but a very different API path then my current thinking.

Here is my current working example which can be generally though of as an abstracted version of GraphX.

https://gist.github.com/richwhitjr/ec27b4ee0405a48a5276016622371ce1

johnynek commented 8 years ago

yeah, actually, I think making a distributed version of quiver is probably a whole separate project.

Would be cool to see yours based on spark.

richwhitjr commented 8 years ago

Created a PR, still needs work(Undirected graphs are completely unsupported) but I wanted to open the idea up for discussion.

https://github.com/twitter/scalding/pull/1583