ysig / GraKeL

A scikit-learn compatible library for graph kernels
https://ysig.github.io/GraKeL/
Other
601 stars 97 forks source link

Can I split train and test data manually? #73

Closed BlockChanZJ closed 2 years ago

BlockChanZJ commented 2 years ago

can I use gk.fit_transform(graphs),(train data and test data are included) then use the sub-matrix of train and test data to run SVM?

giannisnik commented 2 years ago

Hi @BlockChanZJ , Yes, you can do that. But that way, you will also compute the kernel values between the test samples (the SVM classifier doesn't in fact need those kernel values). Thus, if the number of test samples is very large, it will be computationally more expensive than using first fit_transform() and then transform().

BlockChanZJ commented 2 years ago

@giannisnik Thanks for your answer! I mean fit_transform is only for calculate kernel matrix? If so, I don't need to run multi times fit_transform when using k-fold cross validation.

giannisnik commented 2 years ago

Yes, you are right. For k-fold cross validation, you can compute the kernel matrix once (using fit_transform on all graphs) and then at each fold extract the corresponding training and test matrices.

BlockChanZJ commented 2 years ago

Thanks for your answer! It helps me a lot! 👍