pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.09k stars 3.63k forks source link

[Roadmap] Temporal Graph Support 🚀 #3230

Open rusty1s opened 3 years ago

rusty1s commented 3 years ago

Updated on Nov 27th.

This is a roadmap towards better temporal data and temporal GNN support in PyG (targeted for v2.5), which spans across data handling, data loading, models and examples.

Data Handling

Previously, we used TemporalData as our abstraction to represent temporal graphs. However, this data structure is limited, i.e. it can only hold a stream of events, it mixes homogeneous and heterogeneous data, it cannot naturally deal with node features, etc. As such, we want to deprecate TemporalData and move time handling to Data and HeteroData explicitly.

Open Questions:

Data Loading

With Data and HeteroData having time support, we can utilize PyG's data loaders to take care of temporal subgraph sampling. Importantly, NeighborLoader and LinkNeighborLoader have been already equipped with temporal sampling support.

Models

Currently, we only expose TGN as a temporal GNN model, which currently operates on TemporalData and its own sampler implementation.

Examples and Tutorials

Afterwards, we need clear and descriptive examples and tutorials on how to leverage PyG for node-level and link-level temporal applications.

Advanced Features

One current limitation of most temporal GNN models is that they learn an embedding per node, which gets updated over time. However, it is not necessarily scalable to keep this embedding on the GPU due to memory constraints. As such, advanced features may include data-structures to query and train these embeddings on CPU, with efficient asynchronous device transfers to avoid host2device/device2host bottlenecks. An example of such a data-structure may be inspired by the GNNAutoScale paper.

Padarn commented 2 years ago

This would be very useful for our work. Do you have an idea in mind of how this would look?

rusty1s commented 2 years ago

I sadly hadn't have time to fill in some concrete roadmap plans yet, but we are looking into ways to:

  1. Allow easy updates to edge_index for newly incoming edges
  2. Allow for temporal sampling (either as part of current samplers or as its own sampler).
Padarn commented 2 years ago

Ah got it, so the graph itself is temporal, you're not thinking of time series on a graph. Keen to learn more if and when you guys have further thoughts on this. Thanks!

otaviocx commented 2 years ago

Hi @rusty1s, in order to try to contribute to the Roadmap definition/planning, I want to add some topics/thoughts to be considered:

If you agree with any of these topics/thoughts, we can transform them into tasks to be done and I can help with some of them :)

rusty1s commented 2 years ago

Thank you @otaviocx, these are great thoughts. I'm still struggling to find a good way to unify DTDG and CTDG. Currently, TemporalData refers to a list of events (which we could convert to DTDG but not vice versa). How do you think people want to define their temporal data in the first place? Shall we require lists of events as input and then simply allow conversion to snapshots? Do you think it's best to separate both representations from the start or do you think we should unify them?

otaviocx commented 2 years ago

Hi @rusty1s, thanks for answering!

How do you think people want to define their temporal data in the first place?

It is hard to say since this area is really recent and still evolving a lot. But, I would say we can start with something as flexible as possible to support many scenarios. That being said, I can think of 2 ways of temporal data representation:

Shall we require lists of events as input and then simply allow conversion to snapshots?

Thinking on the second approach in the answer above, we can also support graph snapshots in a well-known format (such as networkx). But, I would say we can allow the conversion. Even not working with DTDG, I think the snapshots may be useful for the users/researchers as a debug tool. It can be used to show the graph at specific times and to do some manual comparisons and also plotting. Wdyt?

Do you think it's best to separate both representations from the start or do you think we should unify them?

I think we can have these 2 representations as separated structures and allow the conversion between them. Thinking about a Roadmap, as far as I can see, the DTDG are being used less and less by new models. Thus, we may consider the support of snapshots and conversions as further steps... Wdyt?

rusty1s commented 2 years ago

These are great thoughts, thank you. I'm currently leaning towards option 2, where we let the attribute time (or t) denote the timestamp information. This way, we can convert simple data objects into a stream of events as well (and vice versa). We can also add a Data.update to append new edges to the temporal graph.

otaviocx commented 2 years ago

Hi, @rusty1s. Nice, thanks! We can also add a TemporalData.snapshot method which will receive a param time and will return a snapshot of the Data until the given time, wdyt?

Thus, I think we can already get some tasks from this conversation to put in this Roadmap. I would say:

Is this a good start?

One question: does it make sense to have TemporalData inheriting from BaseData?

rusty1s commented 2 years ago

Yes, this sounds great. I think the separation of Data and TemporalData makes sense in the beginning, but it's also possible to merge those two in later releases.

Yes, TemporalData should inherit from BaseData IMO. This + update and conversion to snapshot Data objects seem like a great starting point.

wsad1 commented 2 years ago

@otaviocx Thanks for this, its a great plan. One thing to think of after we are done with the initial 5 steps you mentioned, is to think of how to incorporate node addition/ deletion to the update method.

Padarn commented 2 years ago

Really interesting discussion.

Have we got enough to put together a set of issues so that people like me can also help out :-)

On Sat, 18 Dec 2021, 8:47 pm Jinu Sunil, @.***> wrote:

@otaviocx https://github.com/otaviocx Thanks for this, its a great plan. One thing to think of after we are done with the initial 5 steps you mentioned, is to think of how to incorporate node addition/ deletion to the update method.

— Reply to this email directly, view it on GitHub https://github.com/pyg-team/pytorch_geometric/issues/3230#issuecomment-997197859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGRPNYAXP7T7LY7QZ6KZNTURR7HJANCNFSM5EVNLVHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

--

By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ https://grab.com/privacy/

This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.

rusty1s commented 2 years ago

Thank you! I updated the initial description of the issue according to our discussion. If you are interested in contributing, feel free to send an initial PR or create a separate issue to discuss further :)

Padarn commented 2 years ago

Nice! The roadmap looks great.

otaviocx commented 2 years ago

Thanks for updating the roadmap @rusty1s.

I'm currently working on items 1, 2, 4 and 5. Planning to submit some PRs this week.

rusty1s commented 2 years ago

Looking forward to it :)

Padarn commented 2 years ago

Hey @otaviocx I thought I might try to start helping out with the last task. Do you have anything in mind for the methods that would be best to implement?

I'm not very familiar with what is being done in the field, but for CTDG I was thinking something that looks like the framework proposed in https://arxiv.org/abs/2006.10637 is probably worth having?

otaviocx commented 2 years ago

Hi @Padarn, sorry for the delay in replying to your message. For some reason, I didn't get notified. I'm not sure if I understood your question, please, excuse me if my answer goes on the wrong path.

So, the TGN is already implemented in https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/nn/models/tgn.py. Actually, this seems to be the only Temporal GNN model defined in PyG. There is a good list of other Temporal architectures here https://pytorch-geometric-temporal.readthedocs.io/en/latest/notes/resources.html

The TemporalData class currently is used by the TGN implementation. But, it is more like an Event List/Log than a temporal graph. I would say that it may work for other CTDG architectures as it is, but not sure for DTDG. Wdyt?

@rusty1s sorry for the delay in sending the PRs. I was stuck on the implementation of all the missing methods that TemporalData should have to inherit from BaseData. I'm preparing two approaches to show soon:

One thing that I realised is that we will also need an HeteroTemporalData for some models, do you agree? The TemporalData class as it is doesn't support different types of nodes or edges yet.

Padarn commented 2 years ago

Hi @otaviocx no worries and thanks for your reply. Your reply does not go down the wrong path at all. My question was just to understand which architectures you thought would best represent what we want to make sure is supported. The reference you provided gives a good starting point.

I agree that we'll probably need some of the changes coming in your upcoming PRs to support DTDG architectures with TemporalData without introducing some hacks.

I wasn't aware of https://pytorch-geometric-temporal.readthedocs.io/en/latest/index.html earlier - Is it fair to say we want to bring most of this functionality inside of pytorch geometric?

otaviocx commented 2 years ago

I wasn't aware of https://pytorch-geometric-temporal.readthedocs.io/en/latest/index.html earlier - Is it fair to say we want to bring most of this functionality inside of pytorch geometric?

Hi @Padarn, I would say that here we will have a more generic approach and elements in a way that pytorch-geometric-temporal or other derivated libs can take advantage of. But, it is better to confirm with the maintainers (@rusty1s, for instance).

sorry for the delay in sending the PRs

Hi @rusty1s, I've created the first PR with items 1 and 2 of the Roadmap. Item 3 will need some refactoring in the JODIEDataset and other classes. This dataset seems to be not supporting the DataLoader yet (we need to deal with the data param in the model since there is no train/val/test dataset split). I'm thinking to refactor this dataset to have a split param in its constructor turning possible to have 3 (train/val/test) DataLoaders in the model example file.

Also, some classes (such as the DataLoader) expects an Union of Data and HeteroData as a param (Union[List[Data], List[HeteroData]] or Union[Data, HeteroData]). I'm refactoring where is needed to BaseData.

otaviocx commented 2 years ago

One thing to think of after we are done with the initial 5 steps you mentioned, is to think of how to incorporate node addition/ deletion to the update method.

Hey @wsad1 and @rusty1s, regarding the update method, if we follow the TGN strategy to model the temporal graph as a stream of events, we can actually have events to add/update edges or nodes or even to delete them: image Thus, I think the update in such a case will be just the addition of new events to the stream, wdyt?

The logic to compute the nodes/edges addition, update or deletion will be actually in the model (Memory in the case of TGN) or in the snapshot function to get a static version of the graph at the given time (considering all the additions, updates and deletions).

Does it make sense to you?

Padarn commented 2 years ago

This is exactly what I was thinking when I read that paper. +1 from me.

On that note, I am looking at working on the temporal snapshots on top of the work you're doing. Let me know if you see any risk of overlapping efforts here.

rusty1s commented 2 years ago

Thus, I think the update in such a case will be just the addition of new events to the stream, wdyt?

Not sure I understand that. Node-wise and interactions events seem to have different representations, so how do you want to merge them?

otaviocx commented 2 years ago

Node-wise and interactions events seem to have different representations, so how do you want to merge them

Well... as per the TGN and Jodie papers, the datasets we currently have are all with a single type of event (interactions). Even so, we can model a node-wise event as an event that has the same value for the src and dst keys. Just checked and if we get an event like that, only the memory and embeddings of this node are changed (on the current TGN implementation).

Now that we are diving a bit deeper into that subject, some other topics/challenges appear:

I am looking at working on the temporal snapshots on top of the work you're doing

@Padarn, considering the topics above, I would say we may need some design definitions before starting to code changes regarding snapshots, wdyt?

otaviocx commented 2 years ago

@Padarn, Data.to_temporal(time_attr='time') on the other side is something that I think we can start working on, wdyt?

rusty1s commented 2 years ago

Thanks for the clarification @otaviocx. So node-level events are treated as self-loops. So, IMO, to_snapshot should probably simply give you a snapshot in the form of a multi-graph with self-loops (solely holding edge features rather than node features). After that, the user could make self-loop features to node features (e.g., based via some DNN module). In theory, we could have different options for to_snapshot (like keep_ony_last=True), but we should start simple and the least magically.

Updates to the number of nodes/edges is another interesting topic, and I have some thoughts on this. For example, one could sub-class Tensor to a TemporalTensor that actually allocates more memory in the node dimension than what is currently needed. To the outside, it utilizes a slicing/masking appraoch to only expose the parts of the tensor that are actually filled with information. In the inside, however, we do not have to allocate new memory (at least not all the time) whenever a new node appears.

yulonglin commented 2 years ago

Most of the CTDG GNNs are using events stream with this same structure: source node id, destination node id, time and message. Thus, it seems the features of the nodes and edges features are mixed in the message. So, if my understanding is right, there is no way to extract a static graph (snapshot) with features in nodes and edges (all the features will be always in edges since they represent the messages/events). We may conclude that the support for DTDG may be based on a different structure (something more similar to what we already have in PyG Temporal, for instance).

@otaviocx I don't think node events should be treated as self-loops, or at least I'm not sure what the rationale for that is. I'm not sure how a model can distinguish between actual self-loops and node events. You can generalize this format to have both node and edge events. This can be implemented by:

There may be different networks to learn from updates to nodes and edges respectively, and we can use slicing/masking to get the relevant events.

This offers a few benefits:

  1. We can now distinguish between self-loops and node events
  2. We can have events with different numbers of features
  3. We can now handle arbitrarily many types of events in different ways, including events involving changes to the number of nodes/edges

What do you think?

yulonglin commented 2 years ago

@otaviocx Oh I realised that you have a PR https://github.com/pyg-team/pytorch_geometric/pull/3985 that's almost ready for merging. I'll take a closer look later this weekend to understand if what I said above makes sense given that it's to be merged :)

Also left comment abouts TemporalDataLoader shuffling and testing in your PR!

yulonglin commented 2 years ago

@Padarn, Data.to_temporal(time_attr='time') on the other side is something that I think we can start working on, wdyt?

@Padarn @otaviocx Let me know if I can help with this :)

Padarn commented 2 years ago

Hey @yulonglin I haven't started on this task. Please go ahead from my point of view. Sorry I missed the emails from this thread.

Padarn commented 2 years ago

@otaviocx sorry for the slow reply

@Padarn, considering the topics above, I would say we may need some design definitions before starting to code changes regarding snapshots, wdyt?

Yeah agreed. I will take a stab at this in a few days if no one else has (not a lot of time around work for the next few days). Maybe I'll split it into a separate issue.

otaviocx commented 2 years ago

Hi @yulonglin, thanks for your messages... I didn't see them early.

I'm currently working on step 4 actually (TemporalData.snapshot(start=None, end=None)) and in a PR to increment the tests for TemporalData. Thus, I didn't start working on Data.to_temporal(time_attr='time'). Please, feel free to implement this one and if you need any help or to discuss the implementation, I will be available. :)

I don't think node events should be treated as self-loops, or at least I'm not sure what the rationale for that is. I'm not sure how a model can distinguish between actual self-loops and node events.

Yep, I agree with you. I suggested the self-loop to represent node events since it will work with the TGN Memory module without any changes and, for other models, we can use the event type as you suggested to differentiate between self-loops and node events. If we decide to have an invalid node in dst when the event is node-wise, we will need to change a bit the current implementation of the TGN modules (since it creates/updates the destination memory).

With the addition of the concept of event types, we will open a range of possibilities by having other types of events (not just node-wise or interaction events). So, I think we will have this concept at some point. Also, to deal with temporal knowledge graphs, we will need something like this.

I think changing it to a fixed invalid node id (e.g. -1) and building in checks to ensure the dst node id is not valid may make things less error-prone.

If we add the event type, I would say both approaches (dst == -1 or dst == src for node events) will be checkable (error-prone). Since it is something we will get from the dataset, we may need to support both approaches, depending on the dataset, for instance.

But, to start simple and since all the already existent temporal datasets in PyG have only interaction events, we can deal with those events first and have a complete solution for them (with the Data <-> TemporalData transformations) working. After that, we can move forward and implement the event types and also other transformations (such as HeteroData <-> TemporalData). Wdyt?

Padarn commented 2 years ago

+1 to start simple and get all of the transformations between dataset done first

@otaviocx cool glad you've picked up the snapshot. Happy to help review a PR for it when you're ready.

I'd suggest we update the Todo list at the top of this issue to reflect the current status? Perhaps we can create separate issues and assign them to avoid any overlap

rusty1s commented 2 years ago

I updated the TODO list at the top of this issue. Let me know if I have missed something. Feel free to create separate issues to discuss specific functionality in more detail.

yulonglin commented 2 years ago

+1 to starting simple too, thanks for the work!

Sure I'll work on Data.to_temporal(time_attr='time') :)

EdisonLeeeee commented 2 years ago

Hi, @rusty1s @Padarn I'm also interested in this roadmap. I've read your discussions and they are indeed inspiring.

I've been working through CTDG and here I want to share some of my thoughts:

Excuse me if my answer goes down the wrong path.

rusty1s commented 2 years ago

Thanks @EdisonLeeeee. What would be the advantage of DiscreteTemporalData over a list of data objects?

EdisonLeeeee commented 2 years ago

Sorry for not making it clear. IMO, if we use a list of data objects, there will be some problems/concerns:

By contrast, there are several benefits of DiscreteTemporalData:

data = DiscreteTemporalData(...)
for time, snapshot in enumerate(data):
    out = model(snapshot.x, snapshot.edge_index, snapshot.edge_attr)

Btw, there are additional differences between DiscreteTemporalData and ContinuousTemporalData :

otaviocx commented 2 years ago

Hi folks! Sorry for the absence last months.

@EdisonLeeeee thanks for your contributions! I would say that after the PRs #3867 and #3985, the TemporalData class is almost what you call ContinuousTemporalData. We also planned a TemporalData.snapshot(start=None, end=None) which appears to be similar to the at method you suggested. Furthermore, I would say that the planned Data.to_temporal(time_attr='time') method will cover the need of the to_continuous() one. Wdyt?

I would say that one of the main contributions of the implementation of this task is to easily convert between continuous and discrete graph representations. To convert to a continuous graph, we require a time attribute. In the other hand, to convert from a continuous graph to a discrete, we need to take snapshots of the graph on each step.

What's the meaning of each data object in this list? (i) A graph where edges occurred until now? (ii) Or a graph where edges occurred at present? For (i), it is less flexible if someone wants to get the edges occurred between start and end. For (ii), it is inconvenient for users who want to get a graph snapshot.

When representing a discrete graph as a list of graphs, I would say it is more trivial to think of a list of snapshots (scenario (i) in your description). It avoids the complications regarding nodes and edges removal and updates. I was checking some DTDG datasets, and it seems they are always using snapshots. Thus, it guides us to the limitation you've mentioned:

it is less flexible if someone wants to get the edges occurred between start and end.

There are two options here:

Someone can also merge the steps (using their on merge strategy) creating a TemporalData object (or a Data object and then using Data.to_temporal) and getting the needed graph by using the TemporalData.snapshot(start=None, end=None) method. Makes sense?

EdisonLeeeee commented 2 years ago

Hi @otaviocx, thanks for your answer! I got your points.

You are right. Currently TemporalData is what I want forContinuousTemporalData. I was going to implement another DiscreteTemporalData specified for DTDG and make TemporalData as simple as possible.

I agree with you and I think DiscreteTemporalData is not necessary, at least not now. TemporalData is able to achieve most of the functionality for both CTDG and DTDG. Thanks again for your help that makes it clear to me :)

Two more questions (if I may ask),

Padarn commented 2 years ago

I think @otaviocx was working on snapshot, and there was some interest in working on the to_temporal.

Maybe we should clean up the roadmap a little and pencil in names if people are working on them. We can probably summarise the discussions in this issue to give a high-level picture of the conclusions. WDTY @otaviocx ?

EdisonLeeeee commented 2 years ago

Thanks @Padarn. Maybe I can help with some of them? I'm interested in this work as well.

I just made a PR in https://github.com/pyg-team/pytorch_geometric/pull/5248 to implement snapshot and update. Could you please give me some suggestions? I need your help @otaviocx

Excuse me if I made something wrong on this roadmap.

Padarn commented 2 years ago

Great! I'll review there

glachaud commented 1 year ago

Hello everyone,

Do you have any updates on the Data.to_temporal(time_attr='time')? I was planning to work on it if it's still on the roadmap and has not been implemented yet.

rusty1s commented 1 year ago

AFAIK, no one is working on that at the moment, so any help to get this in is highly appreciated :)

SalvishGoomanee commented 1 year ago

Hello everyone,

I have a a naive question concerning precisely dealing with TemporalData class as implemented in the TGN paper mentioned above.

I want to be able to plot graphs just like we do with PyG usual data class.

However, when defining the graph G = to_networkx(data, with_labels=True), I get `AttributeError: 'GlobalStorage' object has no attribute 'edge_index' as an error.

So, I was wondering if there is a way to use networkx with the TemporalData class? Has this been implemented somewhere?

rusty1s commented 1 year ago

Currently, to_networkx expects a torch_geometric.data.Data object. We would need to add support for TemporalData here.

SalvishGoomanee commented 1 year ago

Ok got it thanks!

Is there a way to have a map between attributes of the TemporalData class those of the Dataclass? It seems a bit brute force and may not be always optimal, I think...

rusty1s commented 1 year ago

What we could do is provide a edge_index property as part of TemporalData that just builds the graph based on temporal edges. This may unblock the to_networkx call.

SalvishGoomanee commented 1 year ago

I was thinking of something like that.

So, one will then have temporal_data = SomeTemporalDataset[0] where temporal_data.edge_index would return [2, num_events], where the num_events property is num_edges?

rusty1s commented 1 year ago

Yeah, exactly.