shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.02k stars 1.04k forks source link

benchmark GPs against sklearn, gpy, gpflow, and gpml #3054

Open yorkerlin opened 8 years ago

yorkerlin commented 8 years ago

This entrance task is about doing benchmarks against sklearn v0.18, gpy, gpflow, and gpml in terms of speed, memory usage, and accuracy.

Some tasks:

Note that for GP binary classification, there are many implemented inference algorithms (eg, EP, Laplace, and KL) in Shogun.

Hint:

yorkerlin commented 8 years ago

This task is an entrance task of the GP project

vigsterkr commented 8 years ago

@yorkerlin sounds great! as there's already emerging some benchmarks on PCA, shouldn't we try to put this into one framework? i.e. keep the code somewhere and have the possibility to update the benchmarking results with the updated libraries?

yorkerlin commented 8 years ago

@vigsterkr Yes, cool. We can put this under one framework.

vigsterkr commented 8 years ago

@yorkerlin we should use/extend this framework: https://github.com/zoq/benchmarks i can setup a buildbot to generate the output...

you can see some of the results here: http://www.mlpack.org/benchmark.html

the shogun that was used was version 3.2.0...

juancamilog commented 8 years ago

I'm interested in getting started to work on this one. @yorkerlin, do you agree that the framework at https://github.com/zoq/benchmarks should be used? I could initially try testing GP regression exact inference, just to get a hang of what's needed to setup the benchmarks.

yorkerlin commented 8 years ago

Yes. Let's do benchmarks against sklearn, gpy, and GPflow under the framework.

karlnapf commented 8 years ago

@juancamilog I am super curious on the results of this, are there any already?

rcurtin commented 8 years ago

If you wrote some Python scripts for the benchmarking scripts feel free to submit them upstream and we can include it in the benchmarks repository. :)

karlnapf commented 8 years ago

Note

yorkerlin commented 8 years ago

@karlnapf I am setting up an optimized python tool stack for benchmarks. I will use openBLAS, numba, gnumpy, and joblib, if possible. I wonder whether our implementation can outperform the optimized python implementation or not.

yorkerlin commented 8 years ago

@karlnapf @vigsterkr @lisitsyn
It seems llvm compiler is better than gcc in terms of code optimization. Do you think whether we should choose a right compiler?

karlnapf commented 8 years ago

@yorkerlin dont you think it might be more clever to put your ideas into the mlpack benchmarks? For the GPs, if Shogun is faster than the gpflow and gpy, or even just scikit learn, that is the best start.

I think we will stay with gcc for now ... ;)

yorkerlin commented 8 years ago

@karlnapf I will focus on sklearn now. Too many things to do this week. :(

yorkerlin commented 8 years ago

@karlnapf Shogun's GP computes gradients wrt hyper-parameters for gradient-based model selection by default.

I think sklearn does not compute these gradients by default. https://github.com/scikit-learn/scikit-learn/blob/56d625f/sklearn/gaussian_process/gpc.py#L664

karlnapf commented 8 years ago

We should compare a benchmark where we also learn the hyper-parameters using ML2.

yorkerlin commented 8 years ago

what is more, these gradients are for logistic likelihood only in sklearn.

For example, The third derivative in sklearn is for logistic likelihood only. https://github.com/scikit-learn/scikit-learn/blob/56d625f/sklearn/gaussian_process/gpc.py#L353

Shogun's GP can use Student's t, probit and logistic likelihood.

yorkerlin commented 8 years ago

@karlnapf For gradient-based model selection, Shogun uses the Method of Moving Asymptotes method in NLOPT by default.
Sklearn uses fmin_l_bfgs_b for hyper-parameter searching by default.

yorkerlin commented 8 years ago

@karlnapf For sklearn, openBLAS is used in my local environment.

Some results about the usps3v5 dataset: Note that hyper-parameters are fixed.

some statistics about the dataset

number of training samples: 767
number of testing samples: 773
number of features: 256

shogun-python (4.1.0 release version):

using SingleLaplacianInferenceMethodWithLBFGS
cost 0.11 seconds at training
the negative_log_marginal_likelihood is 537.2529
cost 0.37 seconds at prediction
using SingleLaplacianInferenceMethod (newton method)
cost 0.16 seconds at training
the negative_log_marginal_likelihood is 537.2529
cost 0.40 seconds at prediction

sklearn (v0.18 dev0):

using newton method
the negative_log_marginal_likelihood is 537.252898937 
cost 0.65 seconds at training
cost 0.12 seconds at prediction
yorkerlin commented 8 years ago

EDIT: @karlnapf It turns out that I use a wrong hyper-parameter of Gaussian kernel. Now, I fixed the issue. Please see the updated results above.

yorkerlin commented 8 years ago

For prediction, Shogun uses MC samples IIRC. It may be the reason why shogun is slower than sklearn when it comes to prediction.

yorkerlin commented 8 years ago

I will post the script soon.

karlnapf commented 8 years ago

Great, looking forward to see this. Such complicated frameworks as GPs might really allow Shogun to play its muscles ...

karlnapf commented 8 years ago

@Ialong might be interested in this too

ialong commented 8 years ago

@karlnapf yep, definitely interested. Not sure I'll be able to get properly started on this before the deadline tomorrow but could do it over the weekend.

@yorkerlin @karlnapf should I use mlpack then?

yorkerlin commented 8 years ago

@karlnapf It seems GPML cannot run at octave 3.8.2. I will post results based on Matlab 2015b Unix.

yorkerlin commented 8 years ago

@karlnapf

the script: http://nbviewer.jupyter.org/gist/yorkerlin/b742cfe1669170ec0a6a the data: https://gist.github.com/yorkerlin/ee9d99d573dd0dc7d105

karlnapf commented 8 years ago

It would be good to add such plots to the benchmark system by mlpack. @yorkerlin don't you think it can be made work with octave?

karlnapf commented 8 years ago

Why is sklearn 10 times slower in training but faster in testing?

yorkerlin commented 8 years ago

@karlnapf in sklearn, predict function: MAP estimator is used predict_proba function: deterministic integration is used. (https://github.com/scikit-learn/scikit-learn/blob/e5c366f/sklearn/gaussian_process/gpc.py#L290) Note that the deterministic integration is for logistic only. (reference: Williams & Barber, "Bayesian Classification with Gaussian Processes", Appendix A: Approximate the logistic sigmoid by a linear combination of 5 error functions)

in shogun, Monte Carlo (MC) method is used as suggested in GPML In general, Shogun can not use the deterministic integration in sklearn since Shogun can work with probit and Student's t. Of course, Shogun can use the integration for logistic.

One idea: For MC method, we can use vectorization and multi-threading if testing points are I.I.D.

karlnapf commented 8 years ago

Good points. Make sure the benchmarks give the same results. It is meaningless to compare things that produce different results. Also if possible the same algorithm should be used. BUT, at the end of the day, what matter is "user wants to do gp regression with exact inference, which toolbox is fastest"

karlnapf commented 8 years ago

About vectorization: this is done via using linalg About multithread: Yes, definitely this should be done, using openmp if possible

yorkerlin commented 8 years ago

@karlnapf This is for classification :) If data have a lot of outliers, Students' t (approximate inference) should be used instead of Gaussian likelihood (exact inference). For regression (exact inference), I leave it to others

yorkerlin commented 8 years ago

@karlnapf Another case is Cox likelihood for survival analysis.

yorkerlin commented 8 years ago

@karlnapf For beginners or black-box users, the statement is true. For advanced users, they should consider speed, accuracy, and interpretability.

what matter is "user wants to do gp regression with exact inference, which toolbox is fastest" For simple regression, I agree with the statement.

For other cases, it is not always true. For example, I will not use sklearn, if I care about p-values and adjusted R^2.

yorkerlin commented 8 years ago

@karlnapf check out some result about GPy at http://nbviewer.jupyter.org/gist/yorkerlin/b742cfe1669170ec0a6a

karlnapf commented 8 years ago

Cool that Shogun is faster! :) Some things to do

yorkerlin commented 8 years ago

@karlnapf take a look at this http://nbviewer.jupyter.org/gist/yorkerlin/b742cfe1669170ec0a6a the data: https://gist.github.com/yorkerlin/ee9d99d573dd0dc7d105

FITC and Var_DTC are included. Larger problems are included.

yorkerlin commented 8 years ago

For most of methods, Shogun's GP is fastest among these GP toolkits.

However, for Var_DTC, GPy has a parallel implementation. What is more, Var_DTC in GPy is as fast as in Shogun. We should improve the Var_DTC method in Shogun.

vigsterkr commented 8 years ago

@yorkerlin AWESOME++ we should convert this to a http://github.com/zoq/benchmarks/ task, that way we could re-test this anytime we want :)

karlnapf commented 8 years ago

Really nice!

I agree on integrating this into the benchmark repo, as well as using multicore magic. @ialong might be interested

ialong commented 8 years ago

I am definitely interested, perhaps priority should be on #3043 for me now. I will try to get to benchmarking as well

karlnapf commented 8 years ago

Yeah maybe do that one. Or both :) The benchmarking should be a simple matter of more or less copying @yorkerlin 's code to the benchmark system. There were lots of patches for that recently, so it is a no-brainer

yorkerlin commented 8 years ago

@karlnapf take a look at this I borrow some codes from @youssef-emad

https://github.com/yorkerlin/plots-for-GP/blob/master/Untitled1.ipynb for https://github.com/shogun-toolbox/shogun/issues/3135

karlnapf commented 8 years ago

Nice notebook. But my question again is: Why is this useful in particular for GPs. I would much rather just include these plots in the existing notebook, for the interesting cases.

yorkerlin commented 8 years ago

My point is

  1. We should have a notebook about "black box" classifiers. I think we have it now.
  2. We should have a notebook about kernels for advanced users since kernel is one of Shogun's key features. automatical kernel selection will be another option.
  3. We should have a notebook about inference methods for GP if the default method is too slow. Users may want to know how to choose a right inference method for large scale GP.

On Fri, Apr 1, 2016, 05:21 Heiko Strathmann notifications@github.com wrote:

Nice notebook. But my question again is: Why is this useful in particular for GPs. I would much rather just include these plots in the existing notebook, for the interesting cases.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/shogun-toolbox/shogun/issues/3054#issuecomment-204324995

karlnapf commented 8 years ago

I see.

Feel free to create issues for 2/3 but name (or rename existing) them properly, and describe exactly what you want to do and what the focus should be. Can all be done as part of GSoC, but needs proper specifications .... Also, lets keep this thread clean. One thread per topic