Authors: Hiroyuki Kasai
Last page update: November 20, 2020
Latest library version: 1.0.20 (see Release notes for more info)
We are very welcome to your contribution. Please tell us
The SGDLibrary is a pure-MATLAB library or toolbox of a collection of stochastic optimization algorithms. This solves an unconstrained minimization problem of the form, min f(x) = sum_i f_i(x). The SGDLibrary is also operable on GNU Octave (Free software compatible with many MATLAB scripts). Note that this SGDLibrary internally contains the GDLibrary.
The document of SGDLibrary can be obtained from below;
Quasi-Newton variants
SQN (stochastic quasi-Newton)
SVRG-SQN (denoted as "Stochastic L-BFGS" or "slbfgs" in the paper below.)
SVRG-LBFGS (denoted as "SVRG+II: LBFGS" in the paper below.)
SS-SVRG (denoted as "SVRG+I: Subsampled Hessian followed by SVT" in the paper below.)
oBFGS-Inf (Online BFGS, Infinite memory)
oLBFGS-Lim (Online BFGS, Limited memory)
N. N. Schraudolph, J. Yu and S. Gunter, "A stochastic quasi-Newton method for online convex optimization ," International Conference on Artificial Intelligence and Statistics (AISTATS), pp.436-443, Journal of Machine Learning Research, 2007.
A. Mokhtari and A. Ribeiro, "Global convergence of online limited memory BFGS," Journal of Machine Learning Research, 16, pp. 3151-3181, 2015.
Reg-oBFGS-Inf (Regularized oBFGS, Infinite memory) (denoted as "RES" in the paper below.)
Damp-oBFGS-Inf (Regularized damped oBFGS, Infinite memory) (denoted as "SDBFGS" in the paper below.)
IQN (incremental Quasi-Newton method)
Algorithm name in example codes | function | options.sub_mode |
other options |
---|---|---|---|
SGD | sgd |
||
SGD-CM | sgd_cm |
'CM' |
|
SGD-CM-NAG | sgd_cm |
'CM-NAG' |
|
AdaGrad | adagrad |
'AdaGrad' |
|
RMSProp | adagrad |
'RMSProp' |
|
AdaDelta | adagrad |
'AdaDelta' |
|
Adam | adam |
'Adam' |
|
AdaMax | adam |
'AdaMax' |
|
SVRG | svrg |
||
SAG | sag |
'SAG' |
|
SAGA | sag |
'SAGA' |
|
SARAH | sarah |
||
SQN | slbfg s |
'SQN' |
|
SVRG-SQN | slbfgs |
'SVRG-SQN' |
|
SVRG-LBFGS | slbfgs |
'SVRG-LBFGS' |
|
SS-SVRG | subsamp_svrg |
||
oBFGS-Inf | obfgs |
'Inf-mem' |
|
oLBFGS-Lim | obfgs |
'Lim-mem' |
|
Reg-oBFGS-Inf | obfgs |
'Inf-mem' |
regularized=true |
Damp-oBFGS-Inf | obfgs |
'Inf-mem' |
regularized=true & damped=true |
IQN | iqn |
||
SCR | scr |
gradient_sampling=1 & Hessian_sampling=1 |
|
Subsampled-TR | subsamp_tr |
gradient_sampling=1 & Hessian_sampling=1 |
|
SVRG-BB | svrg_bb |
Additionally, the following problems are provided for gradient descent algorithms.
./ - Top directory. ./README.md - This readme file. ./run_me_first.m - The scipt that you need to run first. ./demo.m - Demonstration script to check and understand this package easily. |plotter/ - Contains plotting tools to show convergence results and various plots. |tool/ - Some auxiliary tools for this project. |problem/ - Problem definition files to be solved. |sgd_solver/ - Contains various stochastic optimization algorithms. |sgd_test/ - Some helpful test scripts to use this package. |gd_solver/ - Contains various gradient descent optimization algorithms. |gd_test/ - Some helpful test scripts using gradient descent algorithms to use this package.
Run run_me_first
for path configurations.
%% First run the setup script
run_me_first;
Just execute demo
for the simplest demonstration of this package. This is the case of logistic regression problem.
%% Execute the demonstration script
demo;
The "demo.m" file contains below.
%% generate synthetic data
% set number of dimensions
d = 3;
% set number of samples
n = 300;
% generate data
data = logistic_regression_data_generator(n, d);
%% define problem definitions
problem = logistic_regression(data.x_train, data.y_train, data.x_test, data.y_test);
%% perform algorithms SGD and SVRG
options.w_init = data.w_init;
options.step_init = 0.01;
[w_sgd, info_sgd] = sgd(problem, options);
[w_svrg, info_svrg] = svrg(problem, options);
%% display cost/optimality gap vs number of gradient evaluations
display_graph('grad_calc_count','cost', {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {info_sgd, info_svrg});
Let take a closer look at the code above bit by bit. The procedure has only 4 steps!
Step 1: Generate data
First, we generate datasets including train set and test set using a data generator function logistic_regression_data_generator()
.
The output include train set and test set and an initial value of the solution w
.
d = 3;
n = 300;
data = logistic_regression_data_generator(n, d);
Step 2: Define problem
The problem to be solved should be defined properly from the supported problems. logistic_regression()
provides the comprehensive
functions for a logistic regression problem. This returns the cost value by cost(w)
, the gradient by grad(w)
and the hessian by hess(w)
when given w
.
These are essential for any gradient descent algorithms.
problem = logistic_regression(data.x_train, data.y_train, data.x_test, data.y_test);
Step 3: Perform solver
Now, you can perform optimization solvers, i.e., SGD and SVRG, calling solver functions, i.e., sgd()
function and svrg()
function after setting some optimization options.
options.w_init = data.w_init;
options.step_init = 0.01;
[w_sgd, info_sgd] = sgd(problem, options);
[w_svrg, info_svrg] = svrg(problem, options);
They return the final solutions of w
and the statistics information that include the histories of epoch numbers, cost values, norms of gradient, the number of gradient evaluations and so on.
Step 4: Show result
Finally, display_graph()
provides output results of decreasing behavior of the cost values in terms of the number of gradient evaluations.
Note that each algorithm needs different number of evaluations of samples in each epoch. Therefore, it is common to use this number to evaluate stochastic optimization algorithms instead of the number of iterations.
display_graph('grad_calc_count','cost', {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {info_sgd, info_svrg});
That's it!
"demo_ext.m" gives you more plots.
For the calculation of "optimality gap", you need optimal solution w_opt
beforehand by calling calc_solution()
function of the problem definition function.
%% calculate optimal solution for optimality gap
w_opt = problem.calc_solution(1000);
options.f_opt = problem.cost(w_opt);
This case uses the full gradient descent solve gd()
to obtain an optimal solution under max iteration 1000 with very precise tolerant stopping condition.
Then, you obtain the result of optimality gap by display_graph()
.
display_graph('grad_calc_count','optimality_gap', {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {info_sgd, info_svrg});
Additionally, in this case of logistic regression, the results of classification accuracy are calculated using the corresponding prediction function prediction()
and accuracy
of the problem definition function logistic_regression()
.
Furthermore, the classification accuracies are illustrated by display_classification_result()
function that is written in "demo.m" like below;
%% calculate classification accuracy
% for SGD
% predict
y_pred_sgd = problem.prediction(w_sgd);
% calculate accuracy
accuracy_sgd = problem.accuracy(y_pred_sgd);
fprintf('Classificaiton accuracy: %s: %.4f\n', 'SGD', accuracy_sgd);
% convert from {1,-1} to {1,2}
y_pred_sgd(y_pred_sgd==-1) = 2;
y_pred_sgd(y_pred_sgd==1) = 1;
% for SVRG
% predict
y_pred_svrg = problem.prediction(w_svrg);
% calculate accuracy
accuracy_svrg = problem.accuracy(y_pred_svrg);
fprintf('Classificaiton accuracy: %s: %.4f\n', 'SVRG', accuracy_svrg);
% convert from {1,-1} to {1,2}
y_pred_svrg(y_pred_svrg==-1) = 2;
y_pred_svrg(y_pred_svrg==1) = 1;
%% display classification results
% convert from {1,-1} to {1,2}
data.y_train(data.y_train==-1) = 2;
data.y_train(data.y_train==1) = 1;
data.y_test(data.y_test==-1) = 2;
data.y_test(data.y_test==1) = 1;
% display results
display_classification_result(problem, {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {y_pred_sgd, y_pred_svrg}, {accuracy_sgd, accuracy_svrg}, data.x_train, data.y_train, data.x_test, data.y_test);
Output results:
You need specify additional options before executing solvers.
%% set options for convergence animation
options.max_epoch = 100;
options.store_w = true;
Then, draw_convergence_animation()
draws a convergence animation. Note that draw_convergence_animation()
is executable when only the dimension of the parameters is 2.
%% display convergence animation
draw_convergence_animation(problem, {'SGD', 'SVRG'}, {info_sgd.w, info_svrg.w}, options.max_epoch);
"test_convergence_animation_demo.m" provides you an animation of convergence behaviors of algorithms. Please click the image below to see its animation.
If you have any problems or questions, please contact the author: Hiroyuki Kasai (email: hiroyuki dot kasai at waseda dot jp)