joshday / OnlineStats.jl

⚡ Single-pass algorithms for statistics
https://joshday.github.io/OnlineStats.jl/latest/
MIT License
836 stars 64 forks source link

Need help understanding LinReg #7

Closed tbreloff closed 9 years ago

tbreloff commented 9 years ago

I expect my FLS (flexible least squares) implementation will be similar to your LinReg. Do you have a paper or documentation explaining the method you're using? I don't quite follow what the sweep method is doing, and how that fits in.

joshday commented 9 years ago

This looks like an excerpt from Kenneth Lange's Numerical Analysis for Statisticians. Page 99 shows what the sweep operator does to the matrix [X y]' * [X y], which is stored by CovarianceMatrix. I was reimplementing LinReg last week to be able to get Ridge and Lasso coefficients from the same object, so I apologize for any broken/unfinished code.

http://www.stat.ubc.ca/~ruben/data/Stat560_2010/KennethLange/lange10_ch07.pdf

joshday commented 9 years ago

You'll also see I was using a centered-and-scaled version of least squares, so essentially betahat = inv(cor(x)) * cor(x, y)

I'm doing the sweep operator on cor([x y]) and storing it as a lower triangular matrix.

tbreloff commented 9 years ago

Thanks this was very helpful. It certainly seems like a good technique for classic linear regression. Do you think it has a place when we're not minimizing mse? FLS tries to minimize: sumabs2(y_t - yhat_t) + some_scalar * sumabs2(diff(beta_t)). In other words we're trying to minimize a linear combination of residual and parameter drift. A goal would likely be to find a good value for some_scalar so that out-of-sample fit is best (an extension to this problem). Low some_scalar would lead to overfitting, and if too high it just reverts to classic least squares.

joshday commented 9 years ago

I don't know if other people already have an online algorithm for this or not. It seems like there you also want centered and scaled predictors, so we can rewrite the objective as

β' * cor(x) * β - 2 * β' * cor(x, y) + λ * J(β)

Thus, at any point in time, it's a quadratic programming problem, and the complexity of the problem is constant with the number of predictors.

An easy but slow way would be to just update cor([x y]) and plug this into a convex solver anytime you want an estimate. The advantage is that at any point in time, you can solve for any number of points along the solution path. I wanted to implement something like this where the user could specify an arbitrary penalty term J() (like L1 or L2 norm) and then the estimate could be returned with a solver from Convex.jl.

I have another idea for penalized methods I haven't tried out, where you hold two batches in memory simultaneously to act as test and training sets to help train your λ.

This is getting into the area where we may want a "fast" version and "slow" version.

tbreloff commented 9 years ago

These are cool ideas that I'd definitely be interested in exploring. In the near term, I'm going to use the techniques in http://www2.econ.iastate.edu/tesfatsi/FLSTemporalDataMining.GMontana2009.pdf as the basis for online-pca and online-fls. I'm thinking the sweep method will not apply in this case, which is no big deal.

It would certainly be great to have a framework to do online updates of more generic optimization problems. I have a continuous-valued genetic algorithm implementation, for example, that I would consider redesigning to work online.

On Tue, Apr 28, 2015 at 2:06 PM, Josh Day notifications@github.com wrote:

I don't know if other people already have an online algorithm for this or not. It seems like there you also want centered and scaled predictors, so we can rewrite the objective as

β' * cor(x) * β - 2 * β' * cor(x, y) + λ * J(β)

Thus, at any point in time, it's a quadratic programming problem, and the complexity of the problem is constant with the number of predictors.

An easy but slow way would be to just update cor([x y]) and plug this into a convex solver anytime you want an estimate. The advantage is that at any point in time, you can solve for any number of points along the solution path. I wanted to implement something like this where the user could specify an arbitrary penalty term J() (like L1 or L2 norm) and then the estimate could be returned with a solver from Convex.jl.

I have another idea for penalized methods I haven't tried out, where you hold two batches in memory simultaneously to act as test and training sets to help train your λ.

This is getting into the area where we may want a "fast" version and "slow" version.

— Reply to this email directly or view it on GitHub https://github.com/joshday/OnlineStats.jl/issues/7#issuecomment-97157256 .