ysig / GraKeL

A scikit-learn compatible library for graph kernels
https://ysig.github.io/GraKeL/
Other
593 stars 97 forks source link

modified random walk kernel giving all 1 scores for normalize=true #87

Closed siantist closed 1 year ago

siantist commented 1 year ago

I have a problem with a new RandomWalk class I created where the adjacency matrix is replaced by a weight matrix that is weighted by properties of nodes. I have an issue when I set normalize=True where the kernel returns 1 as the score for different graphs.

I tried to run the transform and diagonal methods separately (with RandomWalk kernel as an additional input). I named it transform2 and I named the RandomWalk kernel as rw6.

I added print statements to transform2:

`def transform2(self, X):

Y = self.parse_input(X)
# Transform - calculate kernel matrix
km = self._calculate_kernel_matrix(Y) 

self._Y = Y

# Self transform must appear before the diagonal call on normilization
self._is_transformed = True
print("self normalize is ", self.normalize)
if self.normalize:
    X_diag, Y_diag = diagonal(self) #diagonal2(X)
    # print statements
    print("X diag is:", X_diag)
    print("Y diag is:", Y_diag)
    print("km before:", km)
    print("np outer ydiag, xdiag is:", np.outer(Y_diag, X_diag))
    print("np sqrt of it is:", np.sqrt(np.outer(Y_diag, X_diag)))
    km /= np.sqrt(np.outer(Y_diag, X_diag))
    print("km after:", km)
return km`

And the output of calling it on grakel graphs from ENZYMES (fitting first on one graph with rw6.fit([grakel_array[5]]) )

transform2(rw6, [grakel_array2[150], grakel_array2[50]])

prints:

X not none parse input pairwise operation sum S is: 922.6613463538778 pairwise operation sum S is: 1122.2417480819133 self normalize is True pairwise operation sum S is: 813.7876608840324 pairwise operation sum S is: 1203.9245992712633 return both diagonals X diag is: [1046.10084544] Y diag is: [ 813.78766088 1203.92459927] km before: [[ 922.66134635] [1122.24174808]] np outer ydiag, xdiag is: [[ 851303.96005555] [1259426.54113795]] np sqrt of it is: [[ 922.66134635] [1122.24174808]] km after: [[1.] [1.]]

I notice that km only has one number for each graph that is input in transform2's argument. Eg. if I call transform2(rw6, [grakel_array2[150]]) then km before would be [ 922.66134635] and after [1.].

Can anyone understand why the normalized score is always 1 (for different graphs)? It seems that the unnormalized scores for different graphs are different.

I defined my custom random walk kernel in https://github.com/siantist/graph_mining/blob/main/grakel/method_random_walk/RandomWalk6.py

giannisnik commented 1 year ago

Hi @siantist ,

The kernel is not supposed to produce normalized kernel values equal to 1. See for instance the following example.

from grakel import Graph,RandomWalk

edges1 = [(0, 1), (1, 2), (2, 3), (3, 0)]
G1 = Graph(edges1)

edges2 = [(0, 1), (1, 2), (2, 3)]
G2 = Graph(edges2)

Gs = [G1,G2]

rw = RandomWalk(normalize=True, lamda=0.1)
K = rw.fit_transform(Gs)

print(K)
[[1.         0.99273013]
 [0.99273013 1.        ]]

Those values might be due to the structure of the graphs contained in the dataset. Are there any symmetries? For instance, in the following example where both graphs are cycles (of different lengths), the normalized kernel values are equal to 1.

from grakel import Graph,RandomWalk

edges1 = [(0, 1), (1, 2), (2, 3), (3, 0)]
G1 = Graph(edges1)

edges2 = [(0, 1), (1, 2), (2, 3), (3, 4), (4, 0)]
G2 = Graph(edges2)

Gs = [G1,G2]

rw = RandomWalk(normalize=True, lamda=0.1)
K = rw.fit_transform(Gs)

print(K)
[[1. 1.]
 [1. 1.]]