pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.16k stars 3.64k forks source link

Reverse set2set #72

Closed allchemist closed 5 years ago

allchemist commented 5 years ago

Hi,

set2set layer performs fixed-size and order-invariant representation of the graph. Is it possible with means of pytorch_geometric to make a reverse step - generate graph back from that dense representation? For example to create an autoencoder.

Thanks

rusty1s commented 5 years ago

That is absolutely possible with means of PyTorch Geometric, but we currently do not have a specific example of this use case. Do you have a specific paper in mind?

allchemist commented 5 years ago

No specific paper. I'm interested in chemical reaction prediction, so there is a molecule (one or two) at left side of the reaction, and a product molecule at the right side. If it is possible to have a graph as the network output, other interesting ideas could be tested also, like extracting trainable chemical descriptors with autoencoder, generating new molecules, etc.

Of course there are papers about the chemical reaction prediction, but they do not consider an option of direct graph generation, so their approaches are very complicated.

Though set2set layer is not the correct tool, it perfectly learns graph invariant, so i feel that smth like bidirectional lstm based approach will work fine.

If you are interested, i could prepare a dataset and a loader for one of these use cases.

allchemist commented 5 years ago

I decided to make own graph restoring solution, but found that i can restore from latent space only graph nodes, but not graph edges, because set2set do not use information about how nodes are connected.

Is it a good idea if i feed the set of edges to another set2set layer, just like it is done with the the set of node attributes? It seems reasonable, but it requires to somehow encode node indexes in categorical way, and lstm feels bad with large sparse data.

BTW graph generation seems to be a challenging task. Some promising approaches could be: https://arxiv.org/pdf/1608.03192.pdf (graph grammar) https://arxiv.org/pdf/1802.04364.pdf (junction tree approach) but they could be challenging to implement as part of pytorch_geometric

rusty1s commented 5 years ago

Hi, I do not think that feeding a set2set layer with the set of edges might be the way to go. IMO, it is much more reasonable to think of a unified data representation that encodes both node attributes and their connectivity. When using graph convolution, you already encode local connectivity patterns in your node embeddings, and you can recursively obtain a more global view by resorting to something like pooling. However, obtaining global graph representations is still an open research problem, and current global aggregation approaches are far from being perfect IMO.

I am definitely in favor of adding graph generation capabilities to PyTorch Geometric (although this is not my main topic of research and I am not that familiar with the SOTA). However, most methods I've seen are very specialized and may not be integrated that well in a unified DL framework.