efficiency about batch training

malllabiisc / CompGCN

ICLR 2020: Composition-Based Multi-Relational Graph Convolutional Networks

Apache License 2.0

597 stars 107 forks source link

efficiency about batch training #21

Closed junkangwu closed 3 years ago

junkangwu commented 3 years ago

Hi, CompGCN is really excellent work on knowledge graph completion. When it is training, The entire eigenmatrix is updated simultaneously and the batch indices were selected out in order to calculate the loss of KGC. I'm curious about if we deal with selecting out batch indices and only update the features about those entities which are selected. In this way, there will be a significant decrease in training time. Have you ever considered this similar training method? Or what do you think of my thoughts on the training method? Thanks a lot in advance!

svjan5 commented 3 years ago

Hi @Wjk666, Yes, that's an interesting idea and we have tried that in past. The issue there is that the representation from CompGCN doesn't depend only on the nodes in the batch but also on their neighbors. Moreover, if we are applying two layers of CompGCN then 2 hop neighborhood is required which covers almost the entire graph. Also, with this batch creation becomes a very computationally expensive operation as for each batch an appropriate subgraph is essential. Thus, our implementation was very slow and finally, we converged on the current implementation. In this project, we made 3-4 such attempts to reduce the training time but the current one works the best for us although we understand that it might not scale for larger graphs without increasing the GPU memory.

Thanks again for sharing your thoughts! Let me know what do you think.

junkangwu commented 3 years ago

@svjan5 Thanks a lot for your patient explanation. I also wonder what do you think of training compgcn in the diffusion matrix which could capture more hop information? I recently read the paper submitted on 2021ICLR-Multi-hop Attention Graph Neural Networkhttps://openreview.net/forum?id=muppfCkU9H1. But it implements it in an interactive way which should repeat encoder(such as gcn or compgcn) a lot of times. May I ask you about the suggestion about an efficient way to combine them?

svjan5 commented 3 years ago

Ok, I was not aware of that work. I will try to go through it once and will share my thoughts.

soumyasanyal commented 3 years ago

Few thoughts - Indeed adding any diffusion kind of approach would lead to a larger receptive field (with possible performance gains). If you can precompute the weight matrix through diffusion efficiently (similar to this: https://arxiv.org/abs/1911.05485), then it should be possible to use CompGCN with a diffusion matrix.