tkipf / gae

Implementation of Graph Auto-Encoders in TensorFlow
MIT License
1.64k stars 351 forks source link

GAE for directed graphs + finding common patterns within a set of graphs #10

Closed MTDzi closed 6 years ago

MTDzi commented 6 years ago

Hi Thomas,

First off: great work :)

Second: I would like to apply your GAE but for weighted directed graphs, without features. I have a set of such graphs which (I hypothesize) contain similar sub-graphs and connection patterns between the nodes. I would like to seek out those patterns, to then verify if they are indeed present (regardless of how I would do that).

My questions are:

  1. I think I can adapt your approach by using a directed graph Laplacian: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.laplacian.html I would use a sigmoid activation function, and minimize a logloss (although it won't really have anything to do with likelihood). But am I missing something? Do you see a glaring hole in this kind of approach?

  2. I wanted to look for the common patterns in my graphs -- let's call them {G_1, G_2, G_3, ...} -- by embedding the nodes into a common space, in which (I hope) the patterns would be easier to find. In your other repo (https://github.com/tkipf/gcn) you're suggesting that when dealing with multiple graphs, one might create a big adjacency matrix by concatenating the adjacency matrices of these graphs {G_1, G_2, G_3, ...}. How large can this concatenated adjacency matrix be? Have you reached a limit? And if so, what was it?

  3. I was also wondering if you think it would be possible to look for patterns in the matrices W (self.vars['weights']) in the GraphConvolutionSparse layer (as an analogy to what you might do with filters in "classical" convolutional neural networks). Now that I think about it, it seems unlikely because the ordering of the rows/columns in the adjacency matrix is pretty much random. But maybe you've given this idea some thought and you'd be willing to point me to an answer: "yes, it can be done" or: "no, no way".

Please, let me know what you think, I would really appreciate it!

tkipf commented 6 years ago

Thanks for your questions! Let me try to answer:

1) I recommend using something along the lines of https://github.com/tkipf/relational-gcn if you have a directed graph, i.e. simply propagate messages along the direction of an edge.

2) If you want to do batch-processing of multiple graphs, you can either a) build a block-diagonal batch adjacency matrix as in the post that you linked or b) stack them into a tensor (batch_size x num_nodes x num_nodes). a) will allow you to use sparse matrix operations - these tend to work up to ~10M edges (in total) on CPU and ~1M edges on GPU. b) will probably be easier to implement and might even be faster on GPU. If you only have a handful of different graphs, you can also process them independently (without mini-batching them into a tensor / block diag. sparse matrix). You can still get "mini-batch-gradients" by just averaging the gradients of a few single samples.

3) If you use higher-order filters (e.g. Chebychev filters for undirected graphs) you can indeed visualize them to some degree. But this is typically only insightful if you have very regular graphs. For directed graphs this hasn't been attempted yet as far as I know.

MTDzi commented 6 years ago

Thanks for your reply.

sunfanyunn commented 6 years ago

I am getting a little bit confused and I have a question.

I though for gcn(or gae), the input graph have to have the exact same structure(connectivity, size, e.t.c). So how is it possible to seek out similar sub-graphs and connection patterns given a set of graphs(different graphs)?

Thanks in advance.

MTDzi commented 6 years ago

My understanding is that you input an adjacency matrix that looks like this, i.e. it's a "diagonal concatenation" of the adjacency matrices of graphs Q, V, and W.

@tkipf is that correct?

But assuming I'm not wrong, once the model learns to encode such an input, you can get embeddings for the vertices, which you can then analyze further, e.g. look for clusters that would correspond to common sub-graphs / motives.

tkipf commented 6 years ago

My understanding is that you input an adjacency matrix that looks like this, i.e. it's a "diagonal concatenation" of the adjacency matrices of graphs Q, V, and W.

@tkipf is that correct?

Yes :-)