shogun-toolbox / shogun

Shōgun
http://shogun-toolbox.org
BSD 3-Clause "New" or "Revised" License
3.03k stars 1.04k forks source link

Write a proper (K)PCA notebook #1878

Closed karlnapf closed 6 years ago

karlnapf commented 10 years ago

Illustrating what PCA does, with proper eye candy.

Some ideas:

This can be combined with many different dimension reduction methods such as Factor analysis, PPCA, etc. Creating cool looking examples is always fun!

Many examples and explanations (including the pancake idea) can be found in http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage

Or here http://www.gatsby.ucl.ac.uk/teaching/courses/ml1-2013/lect2-handout.pdf

kislayabhi commented 10 years ago

I think I'll take this. If there's any more idea or nuance you want to add related to above, please tell. :)

karlnapf commented 10 years ago

There is so much literature on this topic. In particular PCA, so this notebook should be nice and clear and full of cool plots :)

karlnapf commented 10 years ago

@kislayabhi Looks like your pca repository is well suited to be used as a ressource :)

karlnapf commented 10 years ago

See also #1915 and all its additions to the PCA class. Those should be documented with examples where the pros and cons of the methods are discussed

kislayabhi commented 10 years ago

ahh! I am stuck at using the pca for eigenfaces. The method preprocessor.init(datamatrix) couldn't handle even 100x100 pixels images.

kislayabhi commented 10 years ago

it stucks for quite a while and then system hangs.

kislayabhi commented 10 years ago

but when i resize the image to 50 x 50 pixels, it gives me the output after a while

mazumdarparijat commented 10 years ago

@kislayabhi ok! its taking huge memory I presume.

kislayabhi commented 10 years ago

yup! thats for sure!!

mazumdarparijat commented 10 years ago

@kislayabhi We recently added SVD based PCA as well. can you check with that once plz?

kislayabhi commented 10 years ago

@mazumdarparijat Is this added in the issue #1915 . I am afraid i didn't update my local repo!.

mazumdarparijat commented 10 years ago

yes! Will it be possible for you to share the link to your data? I am eager to try myself to see whats happening.

kislayabhi commented 10 years ago

I am using the face image data that comes along : path shogun/data/faces/oksanafaces*.pgm

mazumdarparijat commented 10 years ago

@kislayabhi please update your local repo and then try, it might work because we did a major overhauling of the class in #1915. If it still doesn't work, then its time to get our hands dirty again!! :)

karlnapf commented 10 years ago

The old PCA code decomposed the covariance matrix of the data. For 100x100 images, this is a 10000-dimensional square matrix, which causes problems when decomposing. @mazumdarparijat 's new SVD based PCA should solve this. As he said, computing the SVD of the data matrix should be cubic in the number of points, so that should work.

If it doesn't please post some simple python code (possibly on toy data) that illustrates the problem in an isolated way.

Thanks!

karlnapf commented 10 years ago

I just tried the new PCA implementation on datasets with

and both work quite fast plain vanilla (using the std constructor p=PCA()). So that works - and is a major improvement to what we had before :)

kislayabhi commented 10 years ago

@mazumdarparijat. yeah. Thanks Your update has made my life easy. Things are working as robust as they should. :) :) @karlnapf : yes, everything is working fast. thanks for making it clearer. :)

mazumdarparijat commented 10 years ago

@karlnapf @kislayabhi thats great news then! :)

kislayabhi commented 10 years ago

@karlnapf I have sent a PR! Just one thing! I have applied eigenfaces on att_dataset which the user may have to download from their side!!!

vigsterkr commented 10 years ago

@karlnapf are we done with this?

karlnapf commented 10 years ago

Kernel PCA is still missing, but just a small addition to the existing PCA one. @kislayabhi interested on adding some things? we have a graphical example that we could put in

kislayabhi commented 10 years ago

@karlnapf yeah sure man. Details about the example to be added?

karlnapf commented 10 years ago

Yeah sure, basically we want this here: http://scikit-learn.org/stable/auto_examples/decomposition/plot_kernel_pca.html

It might also be worth re-organising the KPCA code and make it have the same nice properties as your kick-ass PCA implementation, and also phrased again the linalg framework!

karlnapf commented 6 years ago

We long have a PCA notebook