twitter / GraphJet

GraphJet is a real-time graph processing library.
Apache License 2.0
713 stars 111 forks source link

Add support to add edge without metadata in GraphJet #90

Closed jerryjiangabc closed 7 years ago

jerryjiangabc commented 7 years ago

Purpose:

In the earlier pull requests in May 2017, I added the capability of storing edge metadata in GraphJet. Originally, the write interface function of GraphJet was void addEdge(long left, long right); After the enhancement in May 2017, it became void addEdge(long left, long right, long edgeMetadata);

There are valid use cases that edges do not have metadata associated with them, and it is better for GraphJet to support both write interfaces at the same time. This pull request adds back the original write interface, and now GraphJet has two write interface functions void addEdge(long left, long right); void addEdge(long left, long right, long edgeMetadata); Clients can choose different write interface function by selecting different class constructors.

Implementation:

Add one abstract class for each one of the current edge pool implementations, RegularDegreeEdgePool, PowerLawDegreeEdgePool and OptimizedEdgePool.

For example, for regular degree edge pool, AbstractRegularDegreeEdgePool is an abstract class implementing read path related codes. RegularDegreeEdgePool is a concrete class implementing the write interface function without edge metadata. WithEdgeMetadataRegularDegreeEdgePool is a concrete class implementing the write interface function with edge metadata. Similarly, we have three classes for power law degree edge pool and optimized edge pool.

At graph segment level, only NodeMetadataLeftIndexedPowerLawBipartiteGraphSegment will create WithEdgeMetadata* edge pools, and all the other segments will create edge pools without edge metadata.

guimingTang commented 7 years ago

What is the benefit of this implementation, where we declare 2 abstract writeEdge methods in the top level class EdgePool, and have individual classes choose to implement either one, vs. putting the 2 write classes into 2 separate interfaces, and have the edge pools implement the corresponding one directly? The fact that EdgePool allows callers to call 2 write APIs while one throws a runtime exception seems a bit sub-optimal from an OOP standpoint, but I'm guessing there must be more practical reasons for this decision

jerryjiangabc commented 7 years ago

The tradeoff here is between OOP and performance. An OOP-compliant approach will be declaring an Edge interface, and two concrete child classes RegularEdge and EdgeWithMetadata. That way, we need only one abstract method void addEdge(Edge edge). We did not go with this because it adds one object creation for each incoming edge on the write path. Right now, there is no object creation at all because the abstract method only deals with primitives, int and long. In the future, if we are going to add more edge metadata, then we will have to go with the OOP approach.