twitter / cassovary

Cassovary is a simple big graph processing library for the JVM
http://twitter.com/cassovary
Apache License 2.0
1.05k stars 150 forks source link

Edge Weights #168

Open bmckown opened 9 years ago

bmckown commented 9 years ago

Everyone,

I wanted to find out the communal opinion on adding edge weights to cassovary. After having built out quite a few algorithms, it has become clear that these could be improved through the addition of edge weights to the graph structure. This may be as simple as adding a new Node type--maybe something like

/** A map based node that stores edge weights for all connections to the given node
 * @param id The id for the given node
 * @param inboundNodes A map of node id to edge weight for inbound connections between 
 * `this` and source node id
 * @param outboundNodes A map of node id to edge weight for outbound connections 
 * between `this` and target node id 
 */
class EdgeWeightedNode(val id: Int, inboundNodes: Map[Int, Double], outboundNodes: Map[Int, Double]) 
  extends Node {
  ...
}

and a new graph class EdgeWeightedDirectedGraph extends DirectedGraph[EdgeWeightedNode]

Then I could update our algorithms to work with EdgeWeightedDirectedGraphs as well.

I would love to hear some thoughts on this idea and see how we could best implement this. Thanks!

pankajgupta commented 9 years ago

IMHO, edge weights are important for many applications and it is a key missing feature of Cassovary right now. However, Cassovary is all about efficiency of storage, so it will be nice to give thought to that. There are two separate things here: (1) Abstraction -- such as EdgeWeightedNode in your snippet (2) Implementation -- take a look at com.twitter.cassovary.graph.labels and see what you think of managing edge labels as a collection separate from the edge itself.

On Thu, Feb 26, 2015 at 8:49 AM, Ben McKown notifications@github.com wrote:

Everyone,

I wanted to find out the communal opinion on adding edge weights to cassovary. After having built out quite a few algorithms, it has become clear that these could be improved through the addition of edge weights to the graph structure. This may be as simple as adding a new Node type--maybe something like

/* A map based node that stores edge weights for all connections to the given node * @param id The id for the given node * @param inboundNodes A map of node id to edge weight for inbound connections between * this and source node id * @param outboundNodes A map of node id to edge weight for outbound connections * between this and target node id /class EdgeWeightedNode(val id: Int, inboundNodes: Map[Int, Double], outboundNodes: Map[Int, Double]) extends Node { ... }

and a new graph class EdgeWeightedDirectedGraph extends DirectedGraph[EdgeWeightedNode]

Then I could update our algorithms to work with EdgeWeightedDirectedGraphs as well.

I would love to hear some thoughts on this idea and see how we could best implement this. Thanks!

— Reply to this email directly or view it on GitHub https://github.com/twitter/cassovary/issues/168.

pankajgupta commented 9 years ago

Also see #36 for some past discussion on this. CC @szymonm

pocman commented 8 years ago

"IMHO, edge weights are important for many applications and it is a key missing feature of Cassovary right now" @pankajgupta +1 I would really like something similar to Labels.

trait WeightedDirectedGraph[+V <: Node]  extends DirectedGraph[V] {
  /**
    * Labels on edges of this graph.
    */
  var edgeLabels: Labels[(Int, Int)] = _

  /**
    * The label of an edge accessed by name. Label can be anything.
    */
  def labelOfEdge[L : TypeTag](edgeId: (Int, Int), labelName: String): Option[L] = _

} 
pankajgupta commented 8 years ago

Yes indeed. As an aside, there is a beginning of this feature in terms of node labels.

See https://github.com/twitter/cassovary/tree/master/cassovary-core/src/main/scala/com/twitter/cassovary/graph/labels

Perhaps you can tell us a bit about what kind of edge labels will satisfy your use case? e.g., you have how many approx. edges and approx. how many and what kinds of labels per edge?

CC @szymonm https://github.com/twitter/cassovary/issues?q=is%3Apr+is%3Aopen+author%3Aszymonm @plofgren

On Wed, Dec 9, 2015 at 7:51 AM, Thomas notifications@github.com wrote:

"IMHO, edge weights are important for many applications and it is a key missing feature of Cassovary right now" @pankajgupta https://github.com/pankajgupta +1 I would really like something similar to Labels.

— Reply to this email directly or view it on GitHub https://github.com/twitter/cassovary/issues/168#issuecomment-163299685.

pocman commented 8 years ago

I read that labels are suitable for nodes and edges so I used (Int, Int) keys and Long or True labels for my edges. In this use case, my graph is really dense and I have 7 types of labels.

I'm in the process of making it work but I think that I will a hard time querying labels with partial tuples (all labels with a given source node)

In the end, it would be really useful to create an hash/id for each edge that would be used as key for the labels.

pankajgupta commented 8 years ago

Not sure I fully understand. May be make a class of these 7 labels and index that by edge? What do you mean by partial tuples?

The best might be to point to your code if you can.

On Wed, Dec 9, 2015 at 10:36 PM, Thomas notifications@github.com wrote:

I read that labels are suitable for nodes and edges so I used (Int, Int) keys and Long or True labels for my edges. In this use case, my graph is really dense and I have 7 types of labels.

I'm in the process of making it work but I think that I will a hard time querying labels with partial tuples (all labels with a given source node)

In the end, it would be really useful to create an hash/id for each edge that would be used as key for the labels.

— Reply to this email directly or view it on GitHub https://github.com/twitter/cassovary/issues/168#issuecomment-163515362.

pocman commented 8 years ago

How can I create an index by edge without using a tuple of (src.id, dst.id) ?

By partial tuples, I meant that using (src.id, dst.id) as an index for edge, it's really tempting to try to the sum of incoming weights without using the graph representation itself.