joelgrus / data-science-from-scratch

code for Data Science From Scratch book
MIT License
8.63k stars 4.5k forks source link

PCA example #61

Open paulaceccon opened 5 years ago

paulaceccon commented 5 years ago

Why does the PCA example returns components with the opposite sign of the ones from sklearn PCA? Also, when I try to standardize the data and use the code, the components obtained through PCA are the same, which doesn't make sense. Notebook with examples attached. PrincipalComponentAnalysis.ipynb.zip

paulaceccon commented 5 years ago

Using the updated code, the same incorrect result is obtained.

From the code, after scaling the data:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(pca_data)
data_scaled_test = scaler.transform(pca_data)
fpc = pca(data_scaled_test, 2)
fpc 

[[0.7071067811865476, 0.7071067811865476], [0.7071067811865475, 0.7071067811865475]]

From sklearn API:

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(data_scaled_test)
pca.components_

array([[-0.70710678, -0.70710678], [ 0.70710678, -0.70710678]])

PCA_tests.ipynb.zip