shenyangHuang / openDG

Open Dynamic Graph library
1 stars 0 forks source link

Core Temporal Graph Data API Discussion #2

Open shenyangHuang opened 1 week ago

shenyangHuang commented 1 week ago

How to store and process temporal graph data is a central topic to any temporal graph library. In this project, we aim to store and process both discrete and continuous time dynamic graphs, forming the first library that operates on both types of temporal graphs. How we store and process data is also closely linked in operations such as neighbor sampling #1 and computing graph statistics.

Main Operators

Here are the main operations that need to be supported by such data format.

Implementation

See PR #8 for implementation

Implementation see data.py

Data Classes: BaseData, inherited by CTDG and DTDG class.


Data Structure

The underlying data structure must be able to quickly search, sort, add and delete any time steps. Therefore, I propose a core structure of GraphStore dictionary ={timestamp: EventStore}. EventStore can be a dictionary with keys node_feat , edge_feat.

diagram-20241119

shenyangHuang commented 1 week ago

Fast Operations and Limitations of Data Structure

By implementing the temporal graph data as a dictionary with time as keys there comes both fast and slow operations depending on the need.

Fast Operations:

Slow Operations:

The slow operations might be critical, especially when retrieving the temporal neighborhood of a node at a certain point in time, which is often used in models such as TGN

Jacob-Chmura commented 1 week ago

@shenyangHuang This all looks good at a high level. I suggest that we move forward with the data structure you proposed. Once we have good testing CI, we can run perf tests to see where and how severe the bottlenecks are when running real models

Some additional comments

fpour commented 1 week ago

some suggestions:

shenyangHuang commented 1 week ago

How do we solve issues related to the same edge (same node) multiple time at the same timestamp? Do we support it or ignore it?

shenyangHuang commented 3 days ago

For node feature that is dynamic but rarely change over time, how do we store it efficiently without copying it many times? (suggestion from Guillaume) Maybe we should have a specific class for persistent networks (networks that models relationships with a duration), future work

shenyangHuang commented 2 days ago

Matthias feedback (avoid nested dictionaries): EventStore just have 1. edge_index (Tensor), 2. edge_feat (Tensor, # edges x feat), 3. node_index (Tensor), 4. node_feat (# node x feat)

can we not use dictionary? can we just keep tensors, append to it. Mapping from timestamp.

shenyangHuang commented 2 days ago

don't need CTDG and DTDG, just have a single class. maybe use has to specify the format of their timestamp.

maybe we should design with node_type and edge_type in mind