twitter / GraphJet

GraphJet is a real-time graph processing library.
Apache License 2.0
713 stars 111 forks source link

Adding TimeStampIterator for storing segment TimeStamps #56

Closed guimingTang closed 7 years ago

guimingTang commented 7 years ago

This request adds a new Interface that offers getSegmentCreationTime() interface. This request also stores the creation time of LeftIndexedBipartiteGraphSegment. This enables time based edge processing, for example "discard edges older than X hours".

guimingTang commented 7 years ago

Added the implementation of time based edge filtering for user recommendations in TopSecondDegreeByCount. It is achieved by a simple comparison between the segment creation time in which the edge belong to, and the earliest valid timestamp passed in from the request object.

guimingTang commented 7 years ago

Please discard last commit

guimingTang commented 7 years ago

To address Jerry's comment, I added a new interface in TopSecondDegreeByCount, named isValidNodeInfoUpdate. This method allows each sub-class to implement its own aggregation-time filtering logic. For this pull request, we utilize this function to apply time based edge filtering in user recommendation. In the future, we can also utilize this method to filter out edges based on edge types (useful for filtering out edges of social proof types we don't want), which is a needed feature for user recommendation.