Kernels suitable for continuous adjacency matrices?

marsuplink commented 4 years ago

Hi, nice library!

I am wondering, can the kernels, such as the WL kernel, along with the SVM classifier, take graph representations whose adjacency matrix is continuous/soft? I.e. not discrete integral matrices, an example is [[.9, .1], [.1, .9]].

I saw in the documentation that vertex and edge attributes can be continuous features, but can the adjacency matrix be continuous as well? I tested this out and didn't see an obvious error message yet, but wanted to double check to see if I can expect the results to always be right.

Thanks!

giannisnik commented 4 years ago

Hi @marsuplink . Some kernels take into account the weights of the edges. For instance, the shortest path kernel computes the shortest paths between all pairs of nodes. So, if the weight of the edge between nodes A and B is 0.1 and that of the edge between nodes C and D is 0.9, then the distance between A and B is shorter than that between C and D. However, the majority of the kernels ignore the magnitude of these weights. This is the case for the WL subtree kernel: edge weights greater than 0 all are equivalent to each other; they all denote the existence of an (unweighted) edge between the edges' endpoints.

marsuplink commented 4 years ago

Thanks @giannisnik for the helpful reply! So if I wanted to classify graphs with weighted edges using an SVM, I can use the shortest path kernel for the SVM, would that work well in terms of classification accuracy?

Do you think the WL kernel works better than the shortest path kernel for unweighted graphs?

giannisnik commented 4 years ago

You can indeed use the shortest path kernel along with the SVM classifier. However, I should mention that the implemented shortest path kernel compares shortest path lengths using a dirac kernel. So, the kernel may not perform that well if your graphs have only a few shortest paths with identical length or no shortest paths with identical lengths.

With regards to the performance of the two kernels, that depends a lot on the application. But on most benchmark graph classification datasets, the WL kernel outperforms the shortest path kernel.

giannisnik commented 4 years ago

Two other kernels that you could use which utilize directly the adjacency matrix and thus take edge weights into account is the pyramid match kernel and the multiscale Laplacian kernel.

marsuplink commented 4 years ago

Ah that's good to know, thanks! Will definitely give it a try and use the weighted adjacency matrix directly for the pyramid match kernel and the multiscale Laplacian kernel.

p.s. given what you said about the shortest path kernel, i.e. a 0.1 entry indicates short distance between two nodes and 0.9 indicates long distance, I should do some kind of inversion so the adjacency matrix entries going into the gk.fit_transorm() are inversely proportional to the degree? Since usually higher degree corresponds to tighter connection. Or is this taken care of automatically when I give a weighted adjacency matrix.

Thanks Giannis!

giannisnik commented 4 years ago

If the edge weights are in [0,1], for the shortest path kernel, you could set each positive weight w equal to 1-w. However, since the shortest path kernel compares lengths using a dirac kernel, I don't t this will have any impact on performance.

marsuplink commented 4 years ago

I see, will experiment with this. Thanks!

marsuplink commented 4 years ago

@giannisnik thanks again for your help. I'm experimenting with your suggestions, for the multiscale Laplacian kernel, I'm calling it with data as such after initialization: Ks = gk.fit_transform([[weighted_adj, {1: vec1, 2: vec2, ...}], ... ]), where vec1, vec2 are node attribute vectors.

I noticed that the classification accuracy doesn't change much at all after replacing the weighted adjacency matrix with some random matrix (like the identify), am I feeding the data in correctly? I take it that the multiscale Laplacian kernel doesn't discretize the weighted adjacency matrix and just treat nonzero entries as 1's based on what you said earlier.

Thanks!

giannisnik commented 4 years ago

@marsuplink the kernel takes edge weights into account. For instance, consider the following example: import numpy as np from grakel.kernels import MultiscaleLaplacianFast

adj = np.array([[0,0,1,1],[0,0,1,0],[1,1,0,1],[1,0,1,0]]) node_attributes = {0:[1], 1:[1], 2:[1], 3:[1]} G1 = [adj, node_attributes]

adj = np.array([[0,0,.2,.2],[0,0,.2,0],[.2,.2,0,.2],[.2,0,.2,0]]) node_attributes = {0:[1], 1:[1], 2:[1], 3:[1]} G2 = [adj, node_attributes]

Gs = [G1,G2]

ml_kernel = MultiscaleLaplacianFast(P=4, L=2, n_samples=4) K = ml_kernel.fit_transform(Gs) print(K)

The resulting matrix is: K=[[0.5 0.01875952] [0.01875952 0.5 ]] If the kernel did not take edge weights into account then, the two adjacency matrices would be identical to each other and K[0,1] would be equal to K[0,0] and K[1,1].

marsuplink commented 4 years ago

Thanks for the example @giannisnik!

ysig / GraKeL

Kernels suitable for continuous adjacency matrices? #37