Open ZhangGongjie opened 3 years ago
perm
denotes the permutation of top-k entries to keep based on the learned scoring function. This information is indeed missing in the documentation, I'm really sorry. I updated the doc, see here.
Thank you so much. Awesome framework!
May I also ask whether ASAPooling is extremely memory inefficient? Because I apply ASAP with a ratio of 0.5 on nodes of (600, 256), and it gave me OOM error.
This shouldn't be the case. Can you show me a minimal example to reproduce?
That is so kind of you. Actually, I am a newbie in graph neural networks. I made a minimal example to reproduce, as follows.
Please download ref_boxes.pt first. (Link: https://drive.google.com/file/d/18FO-kfKcj1PeJ6QWTyN-bl6YodvJ4kz1/view?usp=sharing)
Codes for reproduction are here:
https://drive.google.com/file/d/1wJugp9yP4XqetVUcVxOf5oS4k7K9EeiY/view?usp=sharing
Running this code should reproduce the large memory consumption issue. Note that I only include feed-forward operation, and no backward propagation is computed. Solely this pooling operation takes ~1G memory in my GPU.
Thank you so much.
I cannot access your code snippet.
I am very sorry. Would you please check the following link?
Thanks for the scripts. You are right that ASAPooling
will consume a lot of memory, especially because your graph is really dense. In particular, there exists about 100K edges in two graphs with node size 300 each. ASAPooling
will compute attention scores based on an edge representation of shape [100K, 512]
, which might well explain the high memory consumption.
PyG expects to operate on sparse graphs, and is not really designed for operating on dense graphs (as all sparse operations aren't). If there is no way to make your graphs sparser, I suggest that you probably want to re-write the ASAPooling
module to take that into account.
Thanks for the explanation. That's really helpful!
I could try to make the graph sparser to ~20 edges per node for a graph of 300 nodes. Do you think this is sparse enough for GNNs to work? Considering that Transformers are intricately densely connected GNNs, do you think GNNs can have any advantages compared with Transformers?
Thank you!
The sparsity of your graph will definitely help in obtaining a lower memory footprint. GNNs should be advantageous to Transformers in case when edges have a "real" meaning, e.g., there is a clear semantic differences between connected and disconnected nodes.
That's really insightful! May I respectfully ask do you think my scenario suits this standard, ie, "edges have a "real" meaning, e.g., there is a clear semantic difference between connected and disconnected nodes"?
In my setting, there are multiple boxes referring different locations in an image, and if their IoU > threshold (could be 0, but that would be too dense), there is an edge between two boxes.
From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.
❓ Questions & Help
What is the meaning of the returning 'perm' in ASAPool? I think it would be much better if such information can in incorporated in docs or annotations in the source codes.