ysig / GraKeL

A scikit-learn compatible library for graph kernels
https://ysig.github.io/GraKeL/
Other
593 stars 97 forks source link

GraphHopper: data type must provide an itemsize #63

Closed shelusb closed 3 years ago

shelusb commented 3 years ago

Hi, I tried running the GH kernel, but got this error: data type must provide an itemsize:

anaconda3\lib\site-packages\grakel\graph.py:312: UserWarning: changing format from "adjacency" to "all" warnings.warn('changing format from "adjacency" to "all"') Traceback (most recent call last): File "test.py", line 71, in t11 = sp_kernel2.fit_transform([g1]) File "anaconda3\lib\site-packages\grakel\kernels\kernel.py", line 197, in fit_transform km = self._calculate_kernel_matrix() File "anaconda3\lib\site-packages\grakel\kernels\kernel.py", line 231, in _calculate_kernel_matrix K[i, i] = self.pairwise_operation(x, x) File "anaconda3\lib\site-packages\grakel\kernels\graph_hopper.py", line 261, in pairwiseoperation return self.metric((xp.reshape(xp.shape[0], m_sq),) + x[1:], File "anaconda3\lib\site-packages\grakel\kernels\graph_hopper.py", line 282, in linear_kernel NA_linear_kernel = np.dot(NA_i, NA_j.T) File "<__array_function__ internals>", line 5, in dot ValueError: data type must provide an itemsize

Here is my code (which works fine with the shortest path kernel):

from grakel import Graph
from grakel import GraphKernel
#sp_kernel = GraphKernel(kernel="shortest_path")
from grakel.kernels import ShortestPath, GraphHopper
g1_edges = {(1, 2): 1, (1, 3): 1, (2, 1): 1, (3, 1): 1}
g1_edge_labels = {(1, 2): [1], (1, 3): [2], (2, 1): [1], (3, 1): [2]}
g1_node_labels = {1:'1',2:'2',3:'3'}
g1 = Graph(g1_edges, node_labels=g1_node_labels, edge_labels=g1_edge_labels)

g2_edges = {(1, 2): 1, (1, 3): 1, (2, 1): 1, (3, 1): 1}
g2_edge_labels = {(1, 2): [1], (1, 3): [2], (2, 1): [2], (3, 1): [1]}
g2_node_labels = {1:'1',2:'2',3:'3'}
g2 = Graph(g2_edges, node_labels=g2_node_labels, edge_labels=g2_edge_labels)

sp_kernel2 = GraphHopper(normalize=True)
t11 = sp_kernel2.fit_transform([g1])
t12 = sp_kernel2.transform([g2])
print(t11)
print(t12)

I also tried it with edge labels instead of edge attributes, but it didn't help. The issue happens in this line: sp_kernel2 = GraphHopper(normalize=True)

giannisnik commented 3 years ago

Hi @shelusb ,

The GraphHopper kernel expects the nodes to be annotated with continuous attributes and not with discrete labels. You just need to replace the lines where you initialize the labels with this: g1_node_labels = {1:[1.0,0.0,0.0],2:[0.0,1.0,0.0],3:[0.0,0.0,1.0]} and g2_node_labels = {1:[1.0,0.0,0.0],2:[0.0,1.0,0.0],3:[0.0,0.0,1.0]} in case your attributes/labels are categorical (one-hot vectors).

Furthermore, I can see that you also assign labels to the edges of the graphs. I thus need to let you know that the GraphHopper kernel ignores these labels (see this: https://ysig.github.io/GraKeL/0.1a8/graph_kernel.html).

gitshelmi commented 3 years ago

@giannisnik thanks for your reply. My edge labels/attributes are all continuous values. So, my adj matrix is like:

1[? 0.2 0.4 2[0.2 ? 0 3[0.4 0 ?]

?: I think it should be 0 because I don't have recursive/self edges. 0: I mean there is no edge between those two nodes: for example no node between the second and third node. 0.2: shows the weight on the edge, for example, the edge between the first and second node has the weight of 0.2.

How would you initialize this graph?

And no names in graph hopper do matter, right? For example, it distinguishes the node between nodes 1 and 2 and 1 and 3 although if they have the same weight, right?


Update: I found this in documentation:

edges = {1: [2, 3], 2: [1], 3: [1]}
node_attributes = {1: [1.2, 0.5], 2: [2.8, −0.6], 3: [0.7, 1.1]}

so for the graph below, should it be like the following?

g1_edges = {1: [2,3], 2:[1], 3:[1]}
g1_node_attributes = {1: [0.2, 0.4], 2: [0.2,0], 3: [0.4,0]}
g1 = Graph(g1_edges, node_labels=g1_node_attributes)

if yes, is there a better way? Because it seems that it's a bit redundant. the g1_edges, shows the network topology, but then in g1_node_attributes you need to provide all weight for all edges per node, right? if yes, then g1_node_attributes also includes g1_edges in it. Another thing that I don't quite understand is how you can specify that 0.4 is for the edge between 1 and 3 in g1_node_attributes = {1: [0.2, 0.4], 2: [0.2,0], 3: [0.4,0]}? Would it work fine If I use the corresponding row of the adj matrix for each node as its attribute? For example, in the above example, g1_node_attributes = {1: [0, 0.2, 0.4], 2: [0,2, 0, 0], 3: [0,4, 0, 0]}

giannisnik commented 3 years ago

@shelusb There are some kernels that can take edge weights into account such as the Pyramid Match kernel and the Multiscale Laplacian kernel.

This topic has been discussed in the past. See this: https://github.com/ysig/GraKeL/issues/37