guo00413 / graph_unpooling

MIT License
5 stars 0 forks source link

meet error when running the notebook #1

Closed poem2018 closed 1 year ago

poem2018 commented 1 year ago

Hi, I try to run the UL_VAE_protein_training.ipynb notebook (I convert it to .py file on my laptop for convenience), but it shows an error in the line240 of the model_graph.py file. The value of edge_attr is None and in the convert_Batch_to_datalist function it calls the index of edge_attr which seems to cause error? Should it change to edge_attr=edge_attr?

guo00413 commented 1 year ago

Hi, thanks for letting us know. Just fixed the issue, (this error is due to the out-versioned utils function) by updating unpool_utils.py. In model_graph.py, the argument should be edge_attr=None in line 240.

Please reopen if this doesn't solve this issue :)

poem2018 commented 1 year ago

Thanks! I am able to run the notebook now. And I have a few questions regarding the running results.

  1. I wonder can the UnpoolLayerEZ class be directly used on the simple graph? (Since in the UL_VAE notebook the generated graph comes from noise sampled from z, if I want to directly unpooling the simple graphs, is there any function provided in the repo?
  2. I want to train the UL_VAE notebook with some very simple graphs to see if it can be trained to be overfitting and generate similar graphs. The training data I used is 500 circle graphs and 500 fully-connected graphs with graph order in (5-9) and node feature dimension=16, I train for 100 epochs with the original hyperparameter setting in the notebook. The visualization results of training and generated graphs look like below. Does this seem to be the correct generation? Or is there any place I might set wrong? training graphs: image generated graphs: image Really thanks if you can give any ideas! btw, I am also interested in graph generation problems and if you don't mind can we add some other contact info for further communication?
guo00413 commented 1 year ago

@poem2018 Hello! Sure, it would be a pleasure to meet you and have further discussion. Please feel free email me at guo00413 [at] umn [dot] edu

  1. I am not sure I understand your question. I think you can use pre-trained Unpool Class (UnpoolLayerEZ ) to unpool a simple graph if you prepare it as a batch containing a single graph (use UnpoolLayerEZ .forward, it will return both unpooled graph (x, batch, edge_index, edge_batch) and log-probability, where the batch will be trivial), as long as the node dimension/edge dimension is as required for the unpool class.
  2. The result doesn't look that good to me. The first thing that comes to my mind is that maybe the training epoch is small (100 * 500 means the VAE trained on 50k graphs). Secondly, capturing a mixed distribution of graphs (i.e., circles and fully connected ones) might be different and might need to tune the hyperparameter; one test I will suggest is to check: examine the encoded vectors for circles and for fully connected ones and make sure the model can differentiate those in encoded space.
poem2018 commented 1 year ago

Hi, Thanks for your reply! Sorry that I was traveling back to China these days and missed the email.

  1. Yeah, I want to unpool the graph from small size to large size (like extend a 5-node fully connect graph to a 10-node fully connect graph and use this as a decoder in a VAE structured model). For the 'pre-trained Unpool class', do you have any pre-trained weight provided in the repo(or any example to use the pre-trained unpool class)? I saw some .pt files in './models' folder, but they seem to be the weight for VAE and GAN model?
  2. And I also try to train the UL_VAE_protein_training.ipynb by replacing the protein dataset with only 300 fully connect graphs in 30 epoch with batchsize=1, and the result seems still not that good. input[cid:a5cc2f83-ec06-4f4d-8c7f-8d763733cc96]output[cid:1af76df1-60d3-416c-8fca-881b65d61b22]

Thanks a lot if you can give some reply!


From: guo00413 @.> Sent: Thursday, May 11, 2023 14:31 To: guo00413/graph_unpooling @.> Cc: Zhu, Ruike @.>; Mention @.> Subject: Re: [guo00413/graph_unpooling] meet error when running the notebook (Issue #1)

@poem2018https://urldefense.com/v3/__https://github.com/poem2018__;!!DZ3fjg!56h3iITtQUpEjbMUgiAgjO7FxRiFzaetLDjTil6mz8pZSMBxsbUlVeSLOn_Qi97dbXKr4atMlaZVikeyhkAheHaMIUbo$ Hello! Sure, it would be a pleasure to meet you and have further discussion. Please feel free email me at guo00413 [at] umn [dot] edu

  1. I am not sure I understand your question. I think you can use pre-trained Unpool Class (UnpoolLayerEZ ) to unpool a simple graph if you prepare it as a batch containing a single graph (use UnpoolLayerEZ .forward, it will return both unpooled graph (x, batch, edge_index, edge_batch) and log-probability, where the batch will be trivial), as long as the node dimension/edge dimension is as required for the unpool class.
  2. The result doesn't look that good to me. The first thing that comes to my mind is that maybe the training epoch is small (100 * 500 means the VAE trained on 50k graphs). Secondly, capturing a mixed distribution of graphs (i.e., circles and fully connected ones) might be different and might need to tune the hyperparameter; one test I will suggest is to check: examine the encoded vectors for circles and for fully connected ones and make sure the model can differentiate those in encoded space.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/guo00413/graph_unpooling/issues/1*issuecomment-1544564565__;Iw!!DZ3fjg!56h3iITtQUpEjbMUgiAgjO7FxRiFzaetLDjTil6mz8pZSMBxsbUlVeSLOn_Qi97dbXKr4atMlaZVikeyhkAheFR8wo8N$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJTWITWGNFB4MYB6YGHZ65TXFU5BBANCNFSM6AAAAAAX4HKNGU__;!!DZ3fjg!56h3iITtQUpEjbMUgiAgjO7FxRiFzaetLDjTil6mz8pZSMBxsbUlVeSLOn_Qi97dbXKr4atMlaZVikeyhkAheNibP7Qf$. You are receiving this because you were mentioned.Message ID: @.***>

guo00413 commented 1 year ago

Hi,

  1. Yes, you can save the unpooling layer itself and use it later. And yes, what I saved in there are VAE/GAN models as a whole.
  2. I don't see the image. I suggest making a much larger epoch, e.g., when I trained the model with protein data, the size of the training set is 38,000 and we trained for 100 epochs, which means the model was trained by 38,000*100 graphs.
poem2018 commented 1 year ago

Hi,

Thanks a lot for your precious help! I find the problem lies in that I didn't use the probs in the output in the loss function. I find the input and output are all int(0,1) type, and I wonder if can I change it to float type by using the probability in output as the edge weight(so that the probs will be included in loss function)?

If this direction is possible, I have a few questions about the code in the unpool_layer_simple_v2.py file. There exist intra-edge and inter-edge when building the unpooling graph. For the inter-edge, the 'probs' variable(in line 461) will be used to generate the edges. The 'probs' variable contains three probabilities [probs2, probs1, probsboth], and probs1 means that child1 will be connected, probs2 means child2 will be connected.

If we take the graph in the attachment as an example,

Assume the graph originally contains node [0,1,2,3,4,5] and edge (0,1),(1,2),(1,3),(3,4),(4,5). node 0, 1 will be unpooled, and the copied node is [6,7]. In this way in my current understanding is,

      for node 1 in edge (1,3):    probs2>probs1, probsboth ---> node 3 and node 7 will connect,       (unpool node - fix node)     probs1>probs2, probsboth ---> node 3 and node 1 will connect,                               probsboth>probs1, probs2 ---> node 3 and node 1, 7 will connect,       for node 0 in edge (0,1):    probs2>probs1, probsboth ---> node 1 and node 6 will connect,       (unpool node - unpool node) probs1>probs2, probsboth ---> node 1 and node 0 will connect,                               probsboth>probs1, probs2 ---> node 1 and node 0,6 will connect,       for node 3 in edge (1,3), since it's a fixed point, it will preserve the original edge (node 1,3)       (fix node - fix node)

I wonder if the above understanding is correct. If so can I use the node probs in each edge as the output edge weight(eg. weight of edge(0,7) = prob2(node 1 in edge (0,1)) )?

Best, Ruike


From: guo00413 @.> Sent: Wednesday, May 24, 2023 15:10 To: guo00413/graph_unpooling @.> Cc: Zhu, Ruike @.>; Mention @.> Subject: Re: [guo00413/graph_unpooling] meet error when running the notebook (Issue #1)

Hi,

  1. Yes, you can save the unpooling layer itself and use it later. And yes, what I saved in there are VAE/GAN models as a whole.
  2. I don't see the image. I suggest making a much larger epoch, e.g., when I trained the model with protein data, the size of the training set is 38,000 and we trained for 100 epochs, which means the model was trained by 38,000*100 graphs.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/guo00413/graph_unpooling/issues/1*issuecomment-1561864169__;Iw!!DZ3fjg!7TUEJlP4dMhqh0qxHAlfSf5Qav12uXVIyIO97is3NP7JZ6-DpMWCGLiKMISVk-O3ZGtV_xU6W2TR_kwYQqc9D8iC_sZm$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJTWITQCPTQAJTO2SHZ6P7LXHZTKJANCNFSM6AAAAAAX4HKNGU__;!!DZ3fjg!7TUEJlP4dMhqh0qxHAlfSf5Qav12uXVIyIO97is3NP7JZ6-DpMWCGLiKMISVk-O3ZGtV_xU6W2TR_kwYQqc9D73d9HPy$. You are receiving this because you were mentioned.Message ID: @.***>

poem2018 commented 1 year ago

sorry, a typo here, the new_pos2 will be [0,1,1,-1 ,1,-1,-1,-1,-1,-1].


From: Zhu, Ruike @.> Sent: Sunday, June 18, 2023 3:03 To: guo00413/graph_unpooling @.> Subject: Re: [guo00413/graph_unpooling] meet error when running the notebook (Issue #1)

Hi,

Thanks a lot for your precious help! I find the problem lies in that I didn't use the probs in the output in the loss function. I find the input and output are all int(0,1) type, and I wonder if can I change it to float type by using the probability in output as the edge weight(so that the probs will be included in loss function)?

If this direction is possible, I have a few questions about the code in the unpool_layer_simple_v2.py file. There exist intra-edge and inter-edge when building the unpooling graph. For the inter-edge, the 'probs' variable(in line 461) will be used to generate the edges. The 'probs' variable contains three probabilities [probs2, probs1, probsboth], and probs1 means that child1 will be connected, probs2 means child2 will be connected.

If we take the graph in the attachment as an example,

Assume the graph originally contains node [0,1,2,3,4,5] and edge (0,1),(1,2),(1,3),(3,4),(4,5). node 0, 1 will be unpooled, and the copied node is [6,7]. In this way in my current understanding is,

      for node 1 in edge (1,3):    probs2>probs1, probsboth ---> node 3 and node 7 will connect,       (unpool node - fix node)     probs1>probs2, probsboth ---> node 3 and node 1 will connect,                               probsboth>probs1, probs2 ---> node 3 and node 1, 7 will connect,       for node 0 in edge (0,1):    probs2>probs1, probsboth ---> node 1 and node 6 will connect,       (unpool node - unpool node) probs1>probs2, probsboth ---> node 1 and node 0 will connect,                               probsboth>probs1, probs2 ---> node 1 and node 0,6 will connect,       for node 3 in edge (1,3), since it's a fixed point, it will preserve the original edge (node 1,3)       (fix node - fix node)

I wonder if the above understanding is correct. If so can I use the node probs in each edge as the output edge weight(eg. weight of edge(0,7) = prob2(node 1 in edge (0,1)) )?

Best, Ruike


From: guo00413 @.> Sent: Wednesday, May 24, 2023 15:10 To: guo00413/graph_unpooling @.> Cc: Zhu, Ruike @.>; Mention @.> Subject: Re: [guo00413/graph_unpooling] meet error when running the notebook (Issue #1)

Hi,

  1. Yes, you can save the unpooling layer itself and use it later. And yes, what I saved in there are VAE/GAN models as a whole.
  2. I don't see the image. I suggest making a much larger epoch, e.g., when I trained the model with protein data, the size of the training set is 38,000 and we trained for 100 epochs, which means the model was trained by 38,000*100 graphs.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/guo00413/graph_unpooling/issues/1*issuecomment-1561864169__;Iw!!DZ3fjg!7TUEJlP4dMhqh0qxHAlfSf5Qav12uXVIyIO97is3NP7JZ6-DpMWCGLiKMISVk-O3ZGtV_xU6W2TR_kwYQqc9D8iC_sZm$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJTWITQCPTQAJTO2SHZ6P7LXHZTKJANCNFSM6AAAAAAX4HKNGU__;!!DZ3fjg!7TUEJlP4dMhqh0qxHAlfSf5Qav12uXVIyIO97is3NP7JZ6-DpMWCGLiKMISVk-O3ZGtV_xU6W2TR_kwYQqc9D73d9HPy$. You are receiving this because you were mentioned.Message ID: @.***>

guo00413 commented 1 year ago

Glad to hear that it works out!

can I change it to float type by using the probability in output as the edge weight(so that the probs will be included in loss function)?

Yes, I think you raised a great idea as a variant of this proposed unpooling method, i.e., to add one dimension of edge weight as the probability. And instead of using RL and handling the 0/1 edges in the intermediate graph, we assign the probability in every edges. One disadvantage I can think of is that, in this way, we will always have a complete graph (because you will have prob. for every edge).

There exist intra-edge and inter-edge when building the unpooling graph. For the inter-edge, the 'probs' variable(in line 461) will be used to generate the edges. The 'probs' variable contains three probabilities [probs2, probs1, probsboth], and probs1 means that child1 will be connected, probs2 means child2 will be connected.

Yes you are right.

  • new_pos1 will be [0,1,1,2,1,3,3,4,3,5]

I think it is a typo, but it should be [0, 1, 1, 2, 1, 3, 3, 4, 4, 5] based on your setting.

sorry, a typo here, the new_pos2 will be [0,1,1,-1 ,1,-1,-1,-1,-1,-1].

new_pos2 should be the new node pooled from the given node, so in this case, it should be [6, 7, 7, -1, 7, -1, -1,-1,-1,-1]

the shape of probs will be (9,3)

probs (the onse used in line 470) will be concatenated by probs1, probs2, and probsboth, and each of these should be of shape based on sample_ones (see line 423 and 426-428). So this should be shape (4, 3)

for node 1 in edge (1,3):    probs2>probs1, probsboth ---> node 3 and node 7 will connect, (unpool node - fix node)     probs1>probs2, probsboth ---> node 3 and node 1 will connect,                              probsboth>probs1, probs2 ---> node 3 and node 1, 7 will connect,

Just to be very accurate, even if probs2 > probs1, probsboth, it doesn't mean it is always node 7 to be connected with node 3, because it actually depends on the probability and we will roll a random variable to determine this. For example, if probs2=0.5, probs1=0.4, probsboth=0.1, it means we have 50% chance to connect node 7, but still 40% chance to connect node 1 and 10% chance to connect both 1 and 7.

      for node 0 in edge (0,1):    probs2>probs1, probsboth ---> node 1 and node 6 will connect,       (unpool node - unpool node) probs1>probs2, probsboth ---> node 1 and node 0 will connect,                               probsboth>probs1, probs2 ---> node 1 and node 0,6 will connect,

Just to be accurate, in unpool node - unpool node case, how we actually connect nodes will depend on probs1/2/both in these two nodes. In this case, we have 3 scenarios you listed for node 0 and 6, also you have 3 scenarios for node 1 and 7, then we will connect the nodes from these two sides. You can refer to our appendix for more detail.

e.g., for node 0, probs2 is large and selected, for node 1, probsboth is large and selected, then we will actually build edges (6, 1), (6, 7)

e.g., for node 0, probs1 is large and selected, for node 1, probs2 is large and selected, then we will actually build edges (0, 7)

      for node 3 in edge (1,3), since it's a fixed point, it will preserve the original edge (node 1,3)       (fix node - fix node)

Yes!

If so can I use the node probs in each edge as the output edge weight(eg. weight of edge(0,7) = prob2(node 1 in edge (0,1)) )?

Yes, but as I mentioned above, in this way, all the intermediate graphs will be complete graphs and a large number of edges will be present. Another potential issue or disadvantage is that, at the final step when you use these probabilities to generate the edges (e.g., you sample edges based on their probabilities), there is no guarantee of a connected graph.

poem2018 commented 1 year ago

Really thanks for the detailed explanation! This clarified some of my puzzles in the unpooling nodes connection. I will try to use probability as edge weight to see how it works. And also, thanks for the potential disadvantages you mentioned, and I'll consider these factors while making code modifications. Thank you again for your guidance!

Best, Ruike


From: guo00413 @.> Sent: Monday, June 19, 2023 9:06 To: guo00413/graph_unpooling @.> Cc: Zhu, Ruike @.>; Mention @.> Subject: Re: [guo00413/graph_unpooling] meet error when running the notebook (Issue #1)

Glad to hear that it works out!

can I change it to float type by using the probability in output as the edge weight(so that the probs will be included in loss function)?

Yes, I think you raised a great idea as a variant of this proposed unpooling method, i.e., to add one dimension of edge weight as the probability. And instead of using RL and handling the 0/1 edges in the intermediate graph, we assign the probability in every edges. One disadvantage I can think of is that, in this way, we will always have a complete graph (because you will have prob. for every edge).

There exist intra-edge and inter-edge when building the unpooling graph. For the inter-edge, the 'probs' variable(in line 461) will be used to generate the edges. The 'probs' variable contains three probabilities [probs2, probs1, probsboth], and probs1 means that child1 will be connected, probs2 means child2 will be connected.

Yes you are right.

I think it is a typo, but it should be [0, 1, 1, 2, 1, 3, 3, 4, 4, 5] based on your setting.

sorry, a typo here, the new_pos2 will be [0,1,1,-1 ,1,-1,-1,-1,-1,-1].

new_pos2 should be the new node pooled from the given node, so in this case, it should be [6, 7, 7, -1, 7, -1, -1,-1,-1,-1]

the shape of probs will be (9,3)

probs (the onse used in line 470) will be concatenated by probs1, probs2, and probsboth, and each of these should be of shape based on sample_ones (see line 423 and 426-428). So this should be shape (4, 3)

for node 1 in edge (1,3):    probs2>probs1, probsboth ---> node 3 and node 7 will connect, (unpool node - fix node)     probs1>probs2, probsboth ---> node 3 and node 1 will connect,                              probsboth>probs1, probs2 ---> node 3 and node 1, 7 will connect,

Just to be very accurate, even if probs2 > probs1, probsboth, it doesn't mean it is always node 7 to be connected with node 3, because it actually depends on the probability and we will roll a random variable to determine this. For example, if probs2=0.5, probs1=0.4, probsboth=0.1, it means we have 50% chance to connect node 7, but still 40% chance to connect node 1 and 10% chance to connect both 1 and 7.

      for node 0 in edge (0,1):    probs2>probs1, probsboth ---> node 1 and node 6 will connect,       (unpool node - unpool node) probs1>probs2, probsboth ---> node 1 and node 0 will connect,                               probsboth>probs1, probs2 ---> node 1 and node 0,6 will connect,

Just to be accurate, in unpool node - unpool node case, how we actually connect nodes will depend on probs1/2/both in these two nodes. In this case, we have 3 scenarios you listed for node 0 and 6, also you have 3 scenarios for node 1 and 7, then we will connect the nodes from these two sides. You can refer to our appendix for more detail.

e.g., for node 0, probs2 is large and selected, for node 1, probsboth is large and selected, then we will actually build edges (6, 1), (6, 7)

e.g., for node 0, probs1 is large and selected, for node 1, probs2 is large and selected, then we will actually build edges (0, 7)

      for node 3 in edge (1,3), since it's a fixed point, it will preserve the original edge (node 1,3)       (fix node - fix node)

Yes!

If so can I use the node probs in each edge as the output edge weight(eg. weight of edge(0,7) = prob2(node 1 in edge (0,1)) )?

Yes, but as I mentioned above, in this way, all the intermediate graphs will be complete graphs and a large number of edges will be present. Another potential issue or disadvantage is that, at the final step when you use these probabilities to generate the edges (e.g., you sample edges based on their probabilities), there is no guarantee of a connected graph.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/guo00413/graph_unpooling/issues/1*issuecomment-1597256293__;Iw!!DZ3fjg!8RoWRtUQ6dmLP9nW_BE2RSjASqnpszcx8o58Aq2ds2x_wW5GSMPgWGw6YTzvOAcvfzzg-uUZ7ykW_cgUGHYFV3yv5ryX$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJTWITWSOZKVKTO2C6UCJX3XMBMHZANCNFSM6AAAAAAX4HKNGU__;!!DZ3fjg!8RoWRtUQ6dmLP9nW_BE2RSjASqnpszcx8o58Aq2ds2x_wW5GSMPgWGw6YTzvOAcvfzzg-uUZ7ykW_cgUGHYFVyZNA7aE$. You are receiving this because you were mentioned.Message ID: @.***>

guo00413 commented 1 year ago

Please feel free to ask or discuss. I hope your idea will work and shine! @poem2018