problem with sklearn's train_test_split after networkx import

ysig / GraKeL

A scikit-learn compatible library for graph kernels

https://ysig.github.io/GraKeL/

Other

588 stars 96 forks source link

problem with sklearn's train_test_split after networkx import #42

Closed SylvainTakerkart closed 3 years ago

SylvainTakerkart commented 4 years ago

Hi there, it seems train_test_split is not happy when you give it a "generator object graph_from_networkx" as input...

To reproduce this:

simply run the grakel example nx_to_grakel.py
and then try to run train_test_split on the resulting G:

G_train, G_test, y_train, y_test = train_test_split(G, [-1,1], test_size=0.5)

I dunno whether it's a bug, but anyhow, do you have a workaround in the meanwhile? (or maybe I don't do things correctly...)

Thanks,

Sylvain

giannisnik commented 4 years ago

Hi @SylvainTakerkart . The number of graphs (i.e., size of G) should be equal to the number of class labels (i.e., [-1,1]). Does G contains 2 graphs?

SylvainTakerkart commented 4 years ago

yes yes! (the example nx_to_grakel.py gives you two graphs, which is why I used [-1, 1] to define y in this toy example, which is just designed for you to reproduce easily the error I'm getting; indeed, I originally found out about this problem using my real data)

also, a problem which is probably directly related: I cannot directly access the elements of G:

In [87]: G[1]

TypeError Traceback (most recent call last)

in ----> 1 G[1] TypeError: 'generator' object is not subscriptable

giannisnik commented 4 years ago

@SylvainTakerkart , if you cast the generator to a list, then you can access the graphs: G = list(G)

With regards to your second problem, are you sure the generator consists of 2 graphs? I ran the following example and it did not produce any errors:

from grakel.graph import Graph

adj1 = [[0, 1, 1], [1, 0, 0], [1, 0, 0]]
G1 = Graph(adj1)
y1 = 1

adj2 = [[0, 1, 1], [1, 0, 1], [1, 1, 0]]
G2 = Graph(adj2)
y2 = 1

Gs = [G1, G2]
y = [y1, y2]

G_train, G_test, y_train, y_test = train_test_split(Gs, y, test_size=0.5)

SylvainTakerkart commented 4 years ago

@SylvainTakerkart , if you cast the generator to a list, then you can access the graphs: G = list(G)

OK, thanks!

With regards to your second problem, are you sure the generator consists of 2 graphs? I ran the following example and it did not produce any errors:
from grakel.graph import Graph

adj1 = [[0, 1, 1], [1, 0, 0], [1, 0, 0]]
G1 = Graph(adj1)
y1 = 1

adj2 = [[0, 1, 1], [1, 0, 1], [1, 1, 0]]
G2 = Graph(adj2)
y2 = 1

Gs = [G1, G2]
y = [y1, y2]

G_train, G_test, y_train, y_test = train_test_split(Gs, y, test_size=0.5)

Yes, I'm sure ;) The difference between what I do and your example is that you directly generate grakel graphs (and in that case, it works), whether in my example, G is the output of the graph_from_networkx function...

And actually, your first fix (recasting as a list) fixes this problem too! Maybe you might want to recast the output of graph_from_networkx as a list directly? If you want, I can open a PR to try to fix this...

Lemme know!

giannisnik commented 4 years ago

@SylvainTakerkart Thanks for the feedback. Indeed, it seems that train_test_split cannot take a generator as input. We will take care of that. Thanks again!