Use Constrained GVAE for Non-Molecular Data?

microsoft / constrained-graph-variational-autoencoder

Sample code for Constrained Graph Variational Autoencoders

MIT License

231 stars 57 forks source link

Use Constrained GVAE for Non-Molecular Data? #2

Open jessxphil opened 3 years ago

jessxphil commented 3 years ago

Hi Team! Excellent work!

This model is exactly what I've been looking for, but I'm not working with molecular data. I'm working with motion data. Is it possible to use this work for data other than generating molecules?

Thank you!

leuchine commented 3 years ago

Hi Jessica,

Thanks for your interest in our work! We are using graph-format data now. Please have a look of https://github.com/microsoft/constrained-graph-variational-autoencoder/blob/master/data/get_qm9.py to get a sense of the graph format.

I am not sure what your motion data look like. If it can be converted into a graph format, I am certain that our code can be adapted for your problem. But if the motion data are sequential, other models (e.g. seq2seq models)might be more appropriate. One simple tutorial about seq2seq modelling can be found at https://github.com/bentrevett/pytorch-seq2seq.

Thanks!

Best Regards, Qi

jessxphil commented 3 years ago

Thank you @leuchine!

My motion data is actually formatted as a list of graphs / adjacency matrix using NetworkX. I read your paper and read through your code a bit. Can you confirm this code is embedding a whole graph of each molecule and decoding/ reconstructing the whole graph from the embedded space?

leuchine commented 3 years ago

Hi Jessica,

Yes. You are right. That's very typical for VAE-based models, where an encoder is used to embed a graph, then a decoder is used to reconstruct the graph. Thanks!

Best Regards, Qi

jessxphil commented 3 years ago

Thank you @leuchine,

I should specify. I'm not looking to embed a single 'whole'/ 'network' graph to complete tasks such as node or link prediction. I need to embed a list of graphs (thousands of graphs) into a single embedded space and then use the decoder to reconstruct that list of graphs / generate each graph. Does Conditional GVAE work as described? I appreciate the quick responses!

leuchine commented 3 years ago

Hi Jessica,

If I understand correctly, you want to encode multiple graphs and decode multiple graphs in a single run? It is a bit tricky for CGVAE to decode multiple graphs, since we use breadth-first search for enumerating edges. Therefore, the current implementation assumes that a single connected component (or graph) will be generated. So the decoder needs to be modified a bit.

Best Regards, Qi

jessxphil commented 3 years ago

Hi Qi,

I understand. Do you happen to know if there's an option out there in the graph ML world that already computes what I described? I imagine if I found a way to update CGVAE's decoder to my specifications the loss of the reconstruction layer might make this solution a no-go (due to BFS)? I don't suppose it's possible to embed a single graph one at a time, layer the embedded space and train the decoder to reconstruct a graph from each layer? (Throwing some ideas against the wall to see what might stick. ( -__-')

Thank you again,

Jessica

leuchine commented 3 years ago

Hi Jessica,

I think it is better to use some non-autoregressive approaches in your case. Note that you can pack your graphs into a single adjacency matrix. As a result, your adjacency matrix will contain multiple connected components/graphs. This makes existing non-autoregressive models usable. Although I have not carefully checked the code of this paper (https://arxiv.org/abs/1905.11600), I think it can be used after you pack the graphs. Thanks!

Best Regards, Qi