mdeff / cnn_graph

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
https://arxiv.org/abs/1606.09375
MIT License
1.33k stars 390 forks source link

why the Chebyshev polynomials is needed? #35

Open zeal-up opened 6 years ago

zeal-up commented 6 years ago

When approximate the initial filters kernel gθ(Λ) = diag(θ) with a polynomial image why we can‘t simply combine the eigenvector and eigenvalue matrix together like image because image * The paper instead, use Chebyshev polynomials and also said that image I'm quite confused here. I don't know why the Chebyshev is needed?

ZiqianXie commented 5 years ago

+1, I was doing GCN paper review and read this paper today, came up with the same question. We can compute Lx, L^2x, ..., L^Kx sequentially using O(K\epsilon) operations, and linearly combine them, same computational complexity.

ZiqianXie commented 5 years ago

I think in the cited wavelet on graph paper (Hammond et al.), they used chebyshev polynomial to approximate a known wavelet function because chebyshev interpolation has certain desired property, in the case where the filters are unknown, the chebyshev polynomials are not necessary, any linear combination of chebyshev polynomials up to Kth degree can be represented by a Kth degree polynomial.

u42dani commented 4 years ago

Has anyone tried node classification on a general graph without chebyshev polynomials (i.e. using only the laplacian and its powers)? What was the accuracy compared to the chebyshev method?

Keramatfar commented 4 years ago

See here: https://www.youtube.com/watch?v=v3jZRkvIOIM. min 26.

Jovonni commented 4 years ago

Has anyone tried node classification on a general graph without chebyshev polynomials (i.e. using only the laplacian and its powers)? What was the accuracy compared to the chebyshev method?

I have used the laplacian, and the normalized laplacian to train a convolution NN. The accuracy on the training, testing, and validation set were all above 85%.

The data I used was proxy related data and the model was for a security related to problem. However, doing that is how I found this work, and xaviers work as a whole... so I’ve seen it work, but Xavier Bresson and his team build on those concepts so well....

So, not the best modeling approach, but it is tractable for certain problems.

ZiqianXie commented 4 years ago

See here: https://www.youtube.com/watch?v=v3jZRkvIOIM. min 26.

Yes, it is stable under coefficient perturbation but otherwise the complexities are the same.

mdeff commented 4 years ago

Any polynomial (monomials, Chebyshev, Legendre, etc.) of order K has the same representation power and can be used. We used Chebyshev because we had experience with it and thought it would be easier to optimize as they form an orthogonal basis. There's however not much difference in practice.

Chebyshev polynomials are more orthogonal (under the identity measure, and truly orthogonal with measure 1/sqrt(1-x²)) than monomials, but less than Legendre (truly orthogonal with the identity measure): polynomial_bases_spectrum

In the vertex domain (for a ring/circle graph), Chebyshev and Legendre polynomials are also more orthogonal than monomials: polynomial_bases_vertex

youkaichao commented 3 years ago

I have the same question as @zeal-github , and opened a question in stack exchange. After reading this issue, I think the question is clear now. Chebyshev polynomials may have better property with respect to parameter orthogonality, but the computation complexity is the same as directly using laplacian matrix.

hazdzz commented 3 years ago

Any polynomial (monomials, Chebyshev, Legendre, etc.) of order K has the same representation power and can be used. We used Chebyshev because we had experience with it and thought it would be easier to optimize as they form an orthogonal basis. There's however not much difference in practice.

Chebyshev polynomials are more orthogonal (under the identity measure, and truly orthogonal with measure 1/sqrt(1-x²)) than monomials, but less than Legendre (truly orthogonal with the identity measure):

polynomial_bases_spectrum

In the vertex domain (for a ring/circle graph), Chebyshev and Legendre polynomials are also more orthogonal than monomials:

polynomial_bases_vertex

Why did you choose the Chebyshev Polynomials of first kind instead of the Legendre Polynomials?

mdeff commented 3 years ago

Mostly for historical reasons. At first we were designing filters (e.g., to solve the diffusion of heat, the propagation of waves, and many others). As Chebyshev polynomials are excellent function approximators, we chose them to approximate those ideal filters we wanted to design. When learning filters, the choice of a polynomial basis doesn't matter for the expressivity of the filters, so we kept Chebyshev as we were familiar with them. We thought it could be easier to optimize, but that doesn't seem to make a difference in practice.

hazdzz commented 3 years ago

Mostly for historical reasons. At first we were designing filters (e.g., to solve the diffusion of heat, the propagation of waves, and many others). As Chebyshev polynomials are excellent function approximators, we chose them to approximate those ideal filters we wanted to design. When learning filters, the choice of a polynomial basis doesn't matter for the expressivity of the filters, so we kept Chebyshev as we were familiar with them. We thought it could be easier to optimize, but that doesn't seem to make a difference in practice.

In your opinions, it doesn't matter which orthogonal polynomials are chosen. Because in practice, the performance is not depends on the kinds of orthogonal polynomials. Am I understanding correctly?

mdeff commented 3 years ago

Yes. It seems that SGD works well enough to find the best polynomial, whatever the basis. That might also be because we use low-order polynomials when learning.

hazdzz commented 3 years ago

Yes. It seems that SGD works well enough to find the best polynomial, whatever the basis. That might also be because we use low-order polynomials when learning.

What is SGD?

mdeff commented 3 years ago

Stochastic Gradient Descent

hazdzz commented 3 years ago

Mostly for historical reasons. At first we were designing filters (e.g., to solve the diffusion of heat, the propagation of waves, and many others). As Chebyshev polynomials are excellent function approximators, we chose them to approximate those ideal filters we wanted to design. When learning filters, the choice of a polynomial basis doesn't matter for the expressivity of the filters, so we kept Chebyshev as we were familiar with them. We thought it could be easier to optimize, but that doesn't seem to make a difference in practice.

I have another big problem. Does the orthogonal polynomials causes low-pass filter? Or Spectral CNN (Spectral Networks and Deep Locally Connected Networks on Graphs) itself is a low-pass filter GNN?

mdeff commented 3 years ago

Neither spectral nor polynomial convolutions enforce low-pass filters. Polynomial convolutions enforce local filters (the locality is controlled by the order of the polynomials). Local filters can be high-pass (simply think about a multiplication by the Laplacian itself, a second-order derivative). The low-pass limitation is specific to GCN [Kipf et al.] as they merged the 0- and 1-neighborhood in the quest of learning a single parameter per filter.