Open MichailChatzianastasis opened 4 years ago
This is a tricky problem, and I'm working on better temporal graph support for PyG at the moment. For your case, it might be best to encode your 6 dynamic graphs into a single data object, e.g. by manually stacking edge_index
diagonally or by saving the edge connectivity separately for different time steps, e.g.:
data.edge_index1 = ...
data.edge_index2 = ...
I encountered the same problem. I read the document and found the way to handle "pairs of graph", but in my case a time series contains many timesteps, let's say 20. The number of timesteps is so large that I have to save edge indices like this:
data.edge_index1 = ...
data.edge_index2 = ...
...
data.edge_index20 = ...
It doesn't look so nice. Hoping for better support for temporal graphs.
I'm also interested in representing batches of time-series data. I bet using the optimised torch_geometric
scatter/gather would provide much better efficiency than the usual dense approach of "pad to max sequence_length across all batches".
Has there been any update on this? I could collapse batch and time to a single dimension, but I think that's less than ideal:
# Data shape [Batch, Time, features]
Batch.from_data_list([data[b, t] for b in batches for t in timesteps])
Why do you think collapsing batch and time dimension is not ideal? Do you have any suggestion on how a "temporal" data handling should look like?
I suppose this goes into my specific use case. I'm adding a new node and some edges at each timestep, so at t-1
I would have t-1
nodes and at t
I would have t
nodes, t-1
of which are duplicated and identical to the the nodes at t-1
. The memory usage blows up in this case.
I suppose we can share node pointers across batch and time by doing this instead:
# Data shape [Batch, Time, features]
Batch.from_data_list([data for b in batches for t in timesteps])
Because the edges are actually responsible for batch/time. But how would this work with edges?
You could hold an additional vector which denotes which edges should be present in a specific timestamp, and then do a simple masking which should hold the memory requirements reasonable low:
edge_index = edge_index[:, timestamp < t]
for those who are interested: check out the new library: pytorch geometric temporal
❓ Questions & Help
Hey, I have a time series problem , where i have data of shape [ n_samples , 6* Data Object] , so i want to represent every sample with 6 graphs. When i try to apply DataLoader ( https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html ) i dont get the expected result, as it doesnt concatenate the samples in the first dimension. Generally, is there a method to handle time series problems? Thanks in advance