rasbt / python-machine-learning-book-3rd-edition

The "Python Machine Learning (3rd edition)" book code repository
https://www.amazon.com/Python-Machine-Learning-scikit-learn-TensorFlow/dp/1789955750/
MIT License
4.6k stars 1.99k forks source link

Question regarding kernel PCA on chapter 5.3 of Python Machine Learning 3rd ed. #172

Open dw-hahn opened 1 year ago

dw-hahn commented 1 year ago

Dear Dr. Raschka,

I am a material scientist who started to learn Python a few months ago. Now I am studying machine learning thanks to your excellent textbook!

I need some help understanding Chapter 5.3 of Python Machine Learning 3rd ed.

When we project a new data point onto the principal component axis in KPCA, the dot product of 'lower case k' and (α / λ) seems to substitute the eigendecomposition in PCA. I see it gives the correct result, but the reason why we have to normalize α with λ is unclear to me.

In Chapter 5.3 of Python Machine Learning 3rd ed.,

Since there is no projection matrix and all samples in the data used to obtain the kernel matrix are already projected onto the principal component axis in kernel PCA, we have to calculate Φ(x')v to project new data.

In the book, you used alphas[25] as a new data point. Since alpha[25] belongs to the original data, kα is equal to Kα. Thus if we normalize kα with λ, we can get α.

So the result is identical to alpha[25]. But does it also work for the data point that does not belong to the original dataset?

In summary,

(1) I have trouble understanding the reason why we have to normalize the eigenvectors with their eigenvalues when we project a new data point onto the PC axis in the case of kernel PCA.

(2) I am not sure we could project new data that is not included in the original dataset in the same way.


Best regards, Dongwoo Hahn