What is the meaning of perm in ASAPool?

pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch

https://pyg.org

MIT License

21.11k stars 3.63k forks source link

What is the meaning of perm in ASAPool? #2847

Open ZhangGongjie opened 3 years ago

ZhangGongjie commented 3 years ago

From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.

❓ Questions & Help

What is the meaning of the returning 'perm' in ASAPool? I think it would be much better if such information can in incorporated in docs or annotations in the source codes.

rusty1s commented 3 years ago

perm denotes the permutation of top-k entries to keep based on the learned scoring function. This information is indeed missing in the documentation, I'm really sorry. I updated the doc, see here.

ZhangGongjie commented 3 years ago

Thank you so much. Awesome framework!

May I also ask whether ASAPooling is extremely memory inefficient? Because I apply ASAP with a ratio of 0.5 on nodes of (600, 256), and it gave me OOM error.

rusty1s commented 3 years ago

This shouldn't be the case. Can you show me a minimal example to reproduce?

ZhangGongjie commented 3 years ago

That is so kind of you. Actually, I am a newbie in graph neural networks. I made a minimal example to reproduce, as follows.

Please download ref_boxes.pt first. (Link: https://drive.google.com/file/d/18FO-kfKcj1PeJ6QWTyN-bl6YodvJ4kz1/view?usp=sharing)

Codes for reproduction are here:

https://drive.google.com/file/d/1wJugp9yP4XqetVUcVxOf5oS4k7K9EeiY/view?usp=sharing

Running this code should reproduce the large memory consumption issue. Note that I only include feed-forward operation, and no backward propagation is computed. Solely this pooling operation takes ~1G memory in my GPU.

Thank you so much.

rusty1s commented 3 years ago

I cannot access your code snippet.

ZhangGongjie commented 3 years ago

I am very sorry. Would you please check the following link?

https://1drv.ms/u/s!AufdUnCZDcQBgplN16BqUwlM8wegBg?e=zn9KTn

rusty1s commented 3 years ago

Thanks for the scripts. You are right that ASAPooling will consume a lot of memory, especially because your graph is really dense. In particular, there exists about 100K edges in two graphs with node size 300 each. ASAPooling will compute attention scores based on an edge representation of shape [100K, 512], which might well explain the high memory consumption.

PyG expects to operate on sparse graphs, and is not really designed for operating on dense graphs (as all sparse operations aren't). If there is no way to make your graphs sparser, I suggest that you probably want to re-write the ASAPooling module to take that into account.

ZhangGongjie commented 3 years ago

Thanks for the explanation. That's really helpful!

I could try to make the graph sparser to ~20 edges per node for a graph of 300 nodes. Do you think this is sparse enough for GNNs to work? Considering that Transformers are intricately densely connected GNNs, do you think GNNs can have any advantages compared with Transformers?

Thank you!

rusty1s commented 3 years ago

The sparsity of your graph will definitely help in obtaining a lower memory footprint. GNNs should be advantageous to Transformers in case when edges have a "real" meaning, e.g., there is a clear semantic differences between connected and disconnected nodes.

ZhangGongjie commented 3 years ago

That's really insightful! May I respectfully ask do you think my scenario suits this standard, ie, "edges have a "real" meaning, e.g., there is a clear semantic difference between connected and disconnected nodes"?

In my setting, there are multiple boxes referring different locations in an image, and if their IoU > threshold (could be 0, but that would be too dense), there is an edge between two boxes.