Reg param in StochasticGradientDescent optimizator

Thanks for asking. When incorporating L-2 regularization into AdaGrad, there is no easy way to handle sparse features, i.e. the per-sample complexity has to be proportional to the feature dimension instead of the number of non-zero entries. If you want a regularization term, you may encode it into the Gradient function, or stop the algorithm after several passes over the dataset. For AdaGrad SGD, early stopping is roughly equivalent to L-2 regularization.

In the Example page, there is a traditional SGD implementation that supports regParam. But traditional SGD is not as fast as AdaGrad in practice.

We will implement more stochastic algorithms in the future, such as SVRG, that is fast enough and handles the regularization more easily.

zhangyuc / splash

Reg param in StochasticGradientDescent optimizator #1