Closed karlnapf closed 6 years ago
I think I'll take this. If there's any more idea or nuance you want to add related to above, please tell. :)
There is so much literature on this topic. In particular PCA, so this notebook should be nice and clear and full of cool plots :)
@kislayabhi Looks like your pca repository is well suited to be used as a ressource :)
See also #1915 and all its additions to the PCA class. Those should be documented with examples where the pros and cons of the methods are discussed
ahh! I am stuck at using the pca for eigenfaces. The method preprocessor.init(datamatrix) couldn't handle even 100x100 pixels images.
it stucks for quite a while and then system hangs.
but when i resize the image to 50 x 50 pixels, it gives me the output after a while
@kislayabhi ok! its taking huge memory I presume.
yup! thats for sure!!
@kislayabhi We recently added SVD based PCA as well. can you check with that once plz?
@mazumdarparijat Is this added in the issue #1915 . I am afraid i didn't update my local repo!.
yes! Will it be possible for you to share the link to your data? I am eager to try myself to see whats happening.
I am using the face image data that comes along : path shogun/data/faces/oksanafaces*.pgm
@kislayabhi please update your local repo and then try, it might work because we did a major overhauling of the class in #1915. If it still doesn't work, then its time to get our hands dirty again!! :)
The old PCA code decomposed the covariance matrix of the data. For 100x100 images, this is a 10000-dimensional square matrix, which causes problems when decomposing. @mazumdarparijat 's new SVD based PCA should solve this. As he said, computing the SVD of the data matrix should be cubic in the number of points, so that should work.
If it doesn't please post some simple python code (possibly on toy data) that illustrates the problem in an isolated way.
Thanks!
I just tried the new PCA implementation on datasets with
and both work quite fast plain vanilla (using the std constructor p=PCA()
). So that works - and is a major improvement to what we had before :)
@mazumdarparijat. yeah. Thanks Your update has made my life easy. Things are working as robust as they should. :) :) @karlnapf : yes, everything is working fast. thanks for making it clearer. :)
@karlnapf @kislayabhi thats great news then! :)
@karlnapf I have sent a PR! Just one thing! I have applied eigenfaces on att_dataset which the user may have to download from their side!!!
@karlnapf are we done with this?
Kernel PCA is still missing, but just a small addition to the existing PCA one. @kislayabhi interested on adding some things? we have a graphical example that we could put in
@karlnapf yeah sure man. Details about the example to be added?
Yeah sure, basically we want this here: http://scikit-learn.org/stable/auto_examples/decomposition/plot_kernel_pca.html
It might also be worth re-organising the KPCA code and make it have the same nice properties as your kick-ass PCA implementation, and also phrased again the linalg framework!
We long have a PCA notebook
Illustrating what PCA does, with proper eye candy.
Some ideas:
This can be combined with many different dimension reduction methods such as Factor analysis, PPCA, etc. Creating cool looking examples is always fun!
Many examples and explanations (including the pancake idea) can be found in http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage
Or here http://www.gatsby.ucl.ac.uk/teaching/courses/ml1-2013/lect2-handout.pdf