ysig / GraKeL

A scikit-learn compatible library for graph kernels
https://ysig.github.io/GraKeL/
Other
601 stars 97 forks source link

One vs many comparison for WeisfeilerLehman; `.transform` gives `TypeError: each element of X must have either a graph with...` #95

Closed ljmartin closed 1 year ago

ljmartin commented 1 year ago

Thanks for the library and for your time in reading this! Describe the bug I'm calculating kernel similarity between a reference graph and a set of 'probe' graphs, avoiding the necessity to calculate the full pairwise kernel matrix. Doing this with kern.fit(graph[:1]) followed by kern.transform(graphs) results in an error (below). Is this expected?

TypeError: each element of X must have either a graph with labels for node and edge or 3 elements consisting of a graph type object, labels for vertices and labels for edges.

To Reproduce

gs = ... #networkx graphs with node labels & edge labels

graphs = [i for i in graph_from_networkx(gs,
                                         as_Graph=False,
                                        node_labels_tag='atomic_num',
                                        edge_labels_tag='bond_type'
                                        )]

kern = WeisfeilerLehman(n_iter=2, normalize=False, base_graph_kernel=NeighborhoodSubgraphPairwiseDistance)
kern = kern.fit(graphs[:1])
kern.transform(graphs)

Expected behavior Running this with, say, RandomWalkLabeled gives an array of shape [1, N] that contains the 1 vs many kernel values. That's what I expected here, too.

Stack Trace full output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/jh/02165y2n7kq2y5ychxtzcjm40000gn/T/ipykernel_44619/1388272858.py in <module>
----> 1 kern.transform(graphs)

~/miniconda3/envs/compchem/lib/python3.9/site-packages/sklearn/utils/_set_output.py in wrapped(self, X, *args, **kwargs)
    138     @wraps(f)
    139     def wrapped(self, X, *args, **kwargs):
--> 140         data_to_wrap = f(self, X, *args, **kwargs)
    141         if isinstance(data_to_wrap, tuple):
    142             # only wrap the first output for cross decomposition

~/miniconda3/envs/compchem/lib/python3.9/site-packages/grakel/kernels/weisfeiler_lehman.py in transform(self, X)
    425             # Calculate the kernel matrix without parallelization
    426             graphs = generate_graphs(WL_labels_inverse, nl)
--> 427             values = [self.X[i].transform(g) for (i, g) in enumerate(graphs)]
    428             K = np.sum(values, axis=0)
    429         else:

~/miniconda3/envs/compchem/lib/python3.9/site-packages/grakel/kernels/weisfeiler_lehman.py in <listcomp>(.0)
    425             # Calculate the kernel matrix without parallelization
    426             graphs = generate_graphs(WL_labels_inverse, nl)
--> 427             values = [self.X[i].transform(g) for (i, g) in enumerate(graphs)]
    428             K = np.sum(values, axis=0)
    429         else:

~/miniconda3/envs/compchem/lib/python3.9/site-packages/sklearn/utils/_set_output.py in wrapped(self, X, *args, **kwargs)
    138     @wraps(f)
    139     def wrapped(self, X, *args, **kwargs):
--> 140         data_to_wrap = f(self, X, *args, **kwargs)
    141         if isinstance(data_to_wrap, tuple):
    142             # only wrap the first output for cross decomposition

~/miniconda3/envs/compchem/lib/python3.9/site-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py in transform(self, X, y)
    261             raise ValueError('transform input cannot be None')
    262         else:
--> 263             Y = self.parse_input(X)
    264 
    265         try:

~/miniconda3/envs/compchem/lib/python3.9/site-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py in parse_input(self, X)
    138                               x.get_labels(purpose="adjacency", label_type="edge"))
    139                 else:
--> 140                     raise TypeError('each element of X must have either ' +
    141                                     'a graph with labels for node and edge ' +
    142                                     'or 3 elements consisting of a graph ' +

TypeError: each element of X must have either a graph with labels for node and edge or 3 elements consisting of a graph type object, labels for vertices and labels for edges.
giannisnik commented 1 year ago

Hi @ljmartin ,

Indeed, for the graphs that are given as input to the transform() function, the edge labels are not retained from one iteration of the WL algorithm to the next and this is why the neighborhood subgraph pairwise distance kernel function cannot be computed (since it requires edges to be annotated with discrete labels).

We will fix this bug in the new version of the package. For now, you can just use the following code:

gs = ... #networkx graphs with node labels & edge labels

graphs = [i for i in graph_from_networkx(gs,
                                         as_Graph=False,
                                        node_labels_tag='atomic_num',
                                        edge_labels_tag='bond_type'
                                        )]

kern = WeisfeilerLehman(n_iter=2, normalize=False, base_graph_kernel=NeighborhoodSubgraphPairwiseDistance)
K_full = kern.fit_transform(graphs)
K = K_full[0,1:]