Open kastnerkyle opened 10 years ago
Mean centering:
http://www.cs.toronto.edu/~dross/ivt/LimRossLinYang_nips04.pdf
http://www.cs.toronto.edu/~dross/ivt/Ross_Tracking_Jan19_2004.pdf
Code: http://www.cs.toronto.edu/~dross/ivt/
TODO:
Make benchmark showing the differences between PCA and Incremental PCA w.r.t n_features reduction, batch_size
Note, this is different than NIPALS
gensim has an implementation to look at
Abbreviated version: Take a minibatch of the data (maybe 2x or 3x the number of features or something... TBD) SVD that Get new minibatch vstack decomposition with minibatch SVD that ad infinitum
Issues: How to handle mean centering when you don't know the global mean? Is minibatch mean sufficient or some kind of online mean estimation? Will an online mean estimate be "good enough"?