Open schulter opened 6 years ago
Hi Roman, thanks for reporting this.
It is certainly possible to optimize the memory efficiency of the model (e.g. computing higher-order terms from the Chebyshev polynomial on the fly and discarding them after they are used in a certain operation), yet these things are quite cumbersome to implement. Another option is to avoid storing the whole graph in memory by doing some form of subsampling.
Currently, there is no published paper about such a subsampling scheme with respect to polynomial filters (such as in Defferard's work), but the GCN model can be very simply optimized by doing either random subsampling (https://arxiv.org/abs/1706.02216) or using some form of importance sampling (https://openreview.net/forum?id=rytstxWAW, https://openreview.net/forum?id=rylejExC-).
Edit: Maybe the PyTorch version of the model is a bit more memory-efficient? https://github.com/tkipf/pygcn
Hi, thanks for letting me know. I'll try one or the other form of that. Since mini-batches in node classification are not really an option, I'll have to find other ways here.
Thanks for the links and the great package,
Roman
Hi @tkipf,
Talking about the Chebyshev polynomial order K. What is the intuition behind the way we set K-order ?
In the paper entitled "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering" it is shown that filters based on Chebyshev are localized . "Strictly localized filters.
Enhancing [4], the proposed spectral filters are provable to be strictly localized in a ball of radius K, i.e. K hops from the central vertex
In this paper, the graph convnet is test on MNIST dataset. Which is a particular form of graph (Grid). The number of nodes is fixed, number of edges for each node is apriori knwon 8 and Chebyshev polynomial order is set to K=25.
For an arbitrary graph structure : variable size of number nodes, variable number of edges per node. How can we choose the value K ?
According to the definition given : Enhancing [4], the proposed spectral filters are provable to be strictly localized in a ball of radius K, i.e. K hops from the central vertex
The K may be intrinsically related to the neighborhood of the central vertex of a given graph .
If i understand well : the less K is, the more filters are localized . AM l wrong ?
Let me explain the idea : If l have a graph of 20 nodes and the max number of edges per nodes is 3. Then setting K=25 results a filters which are not localized. It's like a convolution filter on a grid. Mnist example: rather to set filter of 33 on image of 2828 , when filters are not localized the have the dimension of an image 28*28.
Thank you for your clarification
If i understand well : the less K is, the more filters are localized . AM l wrong ?
Yes, exactly. Filters that are polynomials in the adjacency matrix or the graph Laplacian (like Chebychev filters) are K-localized where K is the order of the polynomial. That means that information is gathered from up to K hops away. Similarly you can increase the receptive field size of a model by stacking more layers. With filter order of K=1 you can stack L layers and have a receptive field size of L. Most recent works design filters of order K=1 and achieve larger receptive field size by simply stacking layers (if you stack deep, then make sure to use gating or residual connections).
Hi, I was wondering how you assign the tensors to the GPU in the first place? I'm not sure how this works with a placeholder dict / feed dict. I have access to a CPU and a GPU.
See here: https://www.tensorflow.org/guide/using_gpu
On Thu 14. Feb 2019 at 16:47 Malika Saksena notifications@github.com wrote:
Hi, I was wondering how you assign the tensors to the GPU in the first place? I'm not sure how this works with a placeholder dict / feed dict. I have access to a CPU and a GPU.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/tkipf/gcn/issues/17#issuecomment-463677041, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAcYHXuSA7ILIPgaEi5G9xxpTgGYdnCks5vNYUEgaJpZM4Qe7yG .
how does the parameter K reflect in the code, it means the ’support‘?
Hi,
I tried to train your GCN network using the Chebychev polynomials. However, on my network and features (~10.000 nodes, ~90.000 edges, 24 feature dimensions), my graphics card seems to quickly run out of memory when using either higher suport polynomials (>3) or more filters (hidden1 > 20).
I am using a NVIDIA GeForce TITAN X card with 12 GB memory.
Do you think the reason for that lies in the implementation and can be tweaked or is it a natural limitation?
The exact error I get is:
Thanks,
Roman