kastnerkyle / todo

My todo list on various things - no code
3 stars 1 forks source link

Implement incremental PCA #7

Open kastnerkyle opened 10 years ago

kastnerkyle commented 10 years ago

Note, this is different than NIPALS

gensim has an implementation to look at

Abbreviated version: Take a minibatch of the data (maybe 2x or 3x the number of features or something... TBD) SVD that Get new minibatch vstack decomposition with minibatch SVD that ad infinitum

Issues: How to handle mean centering when you don't know the global mean? Is minibatch mean sufficient or some kind of online mean estimation? Will an online mean estimate be "good enough"?

kastnerkyle commented 10 years ago

Mean centering:

http://www.cs.toronto.edu/~dross/ivt/LimRossLinYang_nips04.pdf

http://www.cs.toronto.edu/~dross/ivt/Ross_Tracking_Jan19_2004.pdf

Code: http://www.cs.toronto.edu/~dross/ivt/

TODO:

Make benchmark showing the differences between PCA and Incremental PCA w.r.t n_features reduction, batch_size