tkipf / gcn

Implementation of Graph Convolutional Networks in TensorFlow
MIT License
7.11k stars 2k forks source link

Handling growing graphs #83

Closed pavlin-policar closed 5 years ago

pavlin-policar commented 5 years ago

I'm sorry if this has been answered before - I looked through several issues but found no mention of this.

Consider a graph where nodes and edges are constantly being added e.g. in a social network users register and form connections, existing users form new connections between each other. Could I use GCNs or somehow adapt them so as to handle this case? My current understanding is that we train the GCN using the adjacency matrix of the original graph, which is NxN. And if I add nodes, the Laplacian matrix used to train the graph changes entirely (because the node degrees change), and increases in size. So not only will the weight matrices W be the wrong size, but also the domain will change because the degree matrix will change. Any suggestions?

The first thing that pops to mind would be to train the GCN using smaller subnetworks i.e. sample the original network then train the GCN in "mini-batches". Of course, the issue here is that the "mini-batches" could have wildly different numbers of nodes (if we sample somewhere from the core or periphery of the network), but maybe filling the adjacency matrix with zeros to match the largest subnetwork might work. You've probably thought of this and I was wondering about your thoughts on this.

Another question that I had was what would happen if we permute the adjacency matrix or use an isomorphic graph after training the original model. The adjacency matrix would look quite different but the graph structure should be the same. I suspect that the trained model wouldn't do well on the permuted graph. Any thoughts?

Thanks!

tkipf commented 5 years ago

Great question, thanks for your comment!

I think there are a few overlapping problems which you bring up here: 1) Generalization to unseen nodes: this is usually not an issue if node features come from the same distribution. Just set up a new (possibly larger) graph at test time with these added nodes and run the model. 2) Prediction on single nodes: to make a prediction for a single node given a two-layer GCN model, you only need access to its first- and second-order neighborhoods. So you do not need to run the forward pass on the full graph if you only want a prediction for a subset of the nodes. 3) How to handle large graphs: One can do trivial mini-batching that follows from (2): sample a set of nodes and take their full second-order neighborhood into account. This is usually too expensive, and one can do some further subsampling to alleviate this. See, e.g. FastGCN https://arxiv.org/abs/1801.10247

pavlin-policar commented 5 years ago

Thanks for the response! Having looked at a couple more resources, I think I have a better grasp of what's going on than this morning when I opened this issue/question. I probably should have read up a bit more on it before posting the question, but I saw it and it looked really cool :smile:

  1. So if my understanding is correct, the only reason for the adjacency matrix is to "average" over the neighbourhoods of nodes in a way friendly to matrix operations. So if my graph grows over time, I could just replace the adjacency matrix with the larger graph and the feature matrix with a corresponding larger matrix, and this should work.

2/3. So if I wanted to train my model (or make predictions), I could, like in 1., just replace the adjacency matrix and feature matrix with a subgraph that would include e.g. the second-order neighbourhood for a two-layer net.

Thanks for the paper on FastGCN, I'll definitely take a look!

I am primarily interested in link prediction, and this falls under the GAE implementation, but I hope you don't mind me asking the question here. I've looked through the code there and in order to reconstruct the original adjacency matrix (as opposed to the normalized Laplacian we use as the input adjacency matrix), we run the inner product through a sigmoid. When we evaluate the loss and eventually backprop, we take into account all the edges (or lack thereof). Doesn't this bias the model towards predicting no links? Since most graphs are very sparse, I would guess an autoencoder might be quite accurate if it never predicted any links at all. I haven't tried this out, and seeing the results in your paper I suppose this isn't actually an issue, but I'm curious if you have any thoughts on this?

tkipf commented 5 years ago

Yes that’s correct.

As for the GAE: yes, the classes (edge vs. no edge) are imbalanced and in the published implementation we use a weighing factor in the loss to make up for this class imbalance. Alternatively one can sub sample the negative classes (negative sampling), which also makes the algorithm much more scalable.

On Wed 30. Jan 2019 at 22:07 Pavlin Poličar notifications@github.com wrote:

Thanks for the response! Having looked at a couple more resources, I think I have a better grasp of what's going on than this morning when I opened this issue/question. I probably should have read up a bit more on it before posting the question, but I saw it and it looked really cool 😄

  1. So if my understanding is correct, the only reason for the adjacency matrix is to "average" over the neighbourhoods of nodes in a way friendly to matrix operations. So if my graph grows over time, I could just replace the adjacency matrix with the larger graph and the feature matrix with a corresponding larger matrix, and this should work.

2/3. So if I wanted to train my model (or make predictions), I could, like in 1., just replace the adjacency matrix and feature matrix with a subgraph that would include e.g. the second-order neighbourhood for a two-layer net.

Thanks for the paper on FastGCN, I'll definitely take a look!

I am primarily interested in link prediction, and this falls under the GAE implementation, but I hope you don't mind me asking the question here. I've looked through the code there and in order to reconstruct the original adjacency matrix (as opposed to the normalized Laplacian we use as the input adjacency matrix), we run the inner product through a sigmoid. When we evaluate the loss and eventually backprop, we take into account all the edges (or lack thereof). Doesn't this bias the model towards predicting no links? Since most graphs are very sparse, I would guess an autoencoder might be quite accurate if it never predicted any links at all. I haven't tried this out, and seeing the results in your paper I suppose this isn't actually an issue, but I'm curious if you have any thoughts on this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/83#issuecomment-459109306, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYP3ZB9dLzZEWUgr7JQ_5wOL9yBy6ks5vIgmogaJpZM4aZq2q .

pavlin-policar commented 5 years ago

Thanks for your help!