Open shenyangHuang opened 1 week ago
By implementing the temporal graph data as a dictionary with time as keys there comes both fast and slow operations depending on the need.
Fast Operations:
start_time
and end_time
t
Slow Operations:
The slow operations might be critical, especially when retrieving the temporal neighborhood of a node at a certain point in time, which is often used in models such as TGN
@shenyangHuang This all looks good at a high level. I suggest that we move forward with the data structure you proposed. Once we have good testing CI, we can run perf tests to see where and how severe the bottlenecks are when running real models
Some additional comments
Sampling
module will be a core client of this API, which will spam graph aggregations/range queries quite often. I assume Type Conversion and Discretization would only be done once during the training phase? Do we actually need to materialize the fully converted graph?Dataloader
to auto-sample their graph and give them the graph tensors they need for out-of-the-box samplers.to_snapshots
, to_events
, discretize
, and aggregate
, what we want in return is a view over the underlying nodes/edge tensors that gives us quick range queries and neighbour queries. Unless we need true tensor copies, perhaps just building and storing several of these indices is a good way forward. Client needs multiple discretizations and a specific neighbourhood sampler? Build these indices up front and in parallel. Since the queries come chronologically, we could even build these indexes asynchronously.OpenDG
is aiming to be used forNode
, Edge
objects) or does that lead to irregular tensors on GPU? More on the design level, do heterogeneous clients require some additional interface? (e.g. extract subgraph by node type within time range)to_snapshots()
a CTDG, it's possible that the same edge $e = (vi, v_j)$ occurs multiple times in the same time partition. Do we take the last one or each one? If this is not obvious, it should probably be part of the API so the client can decidesome suggestions:
EventStore
instead of DataStore
?EventStore["edge_feat"]
: is it better to save edges as tuples of edge_idx
or (src, dst)
?How do we solve issues related to the same edge (same node) multiple time at the same timestamp? Do we support it or ignore it?
For node feature that is dynamic but rarely change over time, how do we store it efficiently without copying it many times? (suggestion from Guillaume) Maybe we should have a specific class for persistent networks (networks that models relationships with a duration), future work
Matthias feedback (avoid nested dictionaries): EventStore just have 1. edge_index (Tensor), 2. edge_feat (Tensor, # edges x feat), 3. node_index (Tensor), 4. node_feat (# node x feat)
can we not use dictionary? can we just keep tensors, append to it. Mapping from timestamp.
don't need CTDG
and DTDG
, just have a single class.
maybe use has to specify the format of their timestamp.
maybe we should design with node_type
and edge_type
in mind
How to store and process temporal graph data is a central topic to any temporal graph library. In this project, we aim to store and process both discrete and continuous time dynamic graphs, forming the first library that operates on both types of temporal graphs. How we store and process data is also closely linked in operations such as neighbor sampling #1 and computing graph statistics.
Main Operators
Here are the main operations that need to be supported by such data format.
aggregate_graph()
to_snapshots()
andto_events()
to_snapshots()
add argument for time granularityGraphStore
andEventStore
Implementation
See PR #8 for implementation
Implementation see
data.py
Data Classes:
BaseData
, inherited byCTDG
andDTDG
class.aggregate_graph(start_time, end_time)
inBaseData
to_events()
andto_snapshot()
inCTDG
andDTDG
Data Structure
The underlying data structure must be able to quickly search, sort, add and delete any time steps. Therefore, I propose a core structure of
GraphStore
dictionary ={timestamp: EventStore}
.EventStore
can be a dictionary with keysnode_feat
,edge_feat
.EventStore["node_feat"]
is a dictionary of (dynamic) node features.EventStore["node_feat"] = {node_id: vec}
.EventStore["edge_feat"]
is a dictionary of edges and their edge features.EventStore["edge_feat"] = {(src, dst): vec}
.GraphStore
class can also have astatic_node_feat
property which is the standard node feature matrix that doesn't change over time.GraphStore
class can also have astatic_graph_feat
property which stores graph level features, can be added later.