ysig / GraKeL

A scikit-learn compatible library for graph kernels
https://ysig.github.io/GraKeL/
Other
593 stars 97 forks source link

Missing args in kernel.fit_transform() #75

Closed cshjin closed 11 months ago

cshjin commented 2 years ago

Describe the bug The args in kernel.fit_transform is missing y. https://github.com/ysig/GraKeL/blob/33ffff18d99c13f8afc0438a5691cb1206b119fb/grakel/kernels/kernel.py#L169 This will cause problems when send into pipeline, for some kernel methods do not have local fit_transform implantation.

To Reproduce

from grakel.datasets import fetch_dataset
from grakel.kernels.graphlet_sampling import GraphletSampling
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

dataset = fetch_dataset("PTC_MR", verbose=False)
G, y = dataset.data, dataset.target
gk = GraphletSampling(normalize=True, k=3)
G_train, G_test, y_train, y_test = train_test_split(G, y, test_size=0.2)

pipe = Pipeline([("gk", gk), ("svc", SVC(kernel="precomputed"))])
pipe.fit(G_train, y_train)
y_pred = pipe.predict(G_test)
print("accuracy ", accuracy_score(y_test, y_pred))

Expected behavior

 def fit_transform(self, X, y=None): 

should solve the problem.

ysig commented 2 years ago

I have no objection in adding this, but please be specific on what breaks. If y is set and not used it may be deceiving for what our method does with it which is nothing. However when we made the library we had a very older version of scikit.

pwelke commented 1 year ago

I had the same issue with the VertexHistogram kernel. In fact, the sklearn documentation mentions

Pipeline compatibility

For an estimator to be usable together with pipeline.Pipeline in any but the last step, it needs to provide a fit or fit_transform function. To be able to evaluate the pipeline on any data but the training set, it also needs to provide a transform function. There are no special requirements for the last step in a pipeline, except that it has a fit function. All fit and fit_transform functions must take arguments X, y, even if y is not used. Similarly, for score to be usable, the last step of the pipeline needs to have a score function that accepts an optional y.

(I don't want to sound condescending, but I recently read a lot of this documentation, related to my other Issue)

ysig commented 1 year ago

Ok I'll pass the change and add also the graph clone one in the following days.

j-adamczyk commented 1 year ago

@ysig any news on this? GraKeL is a great library, but not being able to use Pipelines is quite limiting, especially for experiments involving many different kernels.

ysig commented 1 year ago

@j-adamczyk Thank you for reaching out.

This update will be on grakel 1.10.0 and we will push it soon (I was waiting in case something else comes up so I can accumulate changes).

If it bottlenecks you from what you are currently doing, a simple solution is to either patch it on the source code or simply do something like:

class Patch(<KernelName>):
  def fit_transform(self, x, y=None):
    return super().fit_transform(x)

kernel = Patch(normalize=True)
kernel.fit_transform(graphs)

Although now it is already done, grakel is more than open to pull-requests and would love to be a repo evolved through the needs of the graph-ml community.

j-adamczyk commented 11 months ago

@ysig how about the release? I will be doing workshops about graph classification in about a week and I wanted to showcase GraKeL, and this workaround is doable, but quite ugly nevertheless.

ysig commented 11 months ago

@j-adamczyk should be ok in the wheels we pushed today. Sorry for the delay. @pwelke should also be ok now.