ysig / GraKeL

A scikit-learn compatible library for graph kernels
https://ysig.github.io/GraKeL/
Other
587 stars 96 forks source link

Letter datasets and Graphlet Sampling: UserWarning: wrong format: edge dictionary must exist #66

Closed jungla88 closed 1 year ago

jungla88 commented 3 years ago

I would like to test these datasets (Letter low,medium and high) with GraphletSampling method. It works well for example with MUTAG but I experienced some problems with Letter. In particular I had this error when calling fit_transform on the graph dataset loaded with fetch_dataset with prefer_attr_nodes=True,prefer_attr_edges=True:

TypeError: cannot unpack non-iterable NoneType object. UserWarning: wrong format: edge dictionary must exist
  warnings.warn('wrong format: edge dictionary must exist')

When I dive into the code I saw a comment in fit_transform method that tell us that the input data structure has to be an iterable with 3 features. For LetterL.data[0] I have these features:

[{(2, 4), (3, 1), (5, 4), (4, 2), (4, 5), (1, 3)}, {1: [0.6023359894752502, 2.959130048751831], 2: [1.8997299671173096, 2.7767999172210693], 3: [0.7683429718017578, 0.6964200139045715], 4: [0.6626060009002686, 1.6655299663543701], 5: [1.8528200387954712, 0.7046809792518616]}, {}]

which tell me the last feature is empty which should be the edge attribute data structure. I read the seminal paper about the graphlet kernel and I have not found any relevant information that make the procedure infeasible for graph without edge labels. So, my question is: does the dataset itself has some problems or I'm trying to use a dataset that cannot work with graphlet kernel?

ysig commented 3 years ago

Can you please provide a reproducible example and detail what your use case is according to the library? We have run multiple experiments in the past with MUTAG and graphlet sampling. Once you make this more clear, we will be happy to help you.