Misleading kernel PCA example

dkirkby commented 6 years ago

The only text accompanying this kernel PCA example is:

This example shows that Kernel PCA is able to find a projection of the data that makes data linearly separable.

However, this example is very fine tuned and probably gives most readers a misleading impression. To demonstrate this, the following function reproduces the example's projection to a 2D latent space (lower left plot):

def kpca_demo(factor=.3, gamma=10, seed=0):
    np.random.seed(seed)
    X, y = make_circles(n_samples=400, factor=factor, noise=.05)
    kpca = KernelPCA(kernel="rbf", gamma=gamma)
    X_kpca = kpca.fit_transform(X)
    # Plot projection into the 2D latent space.
    reds = y == 0
    blues = y == 1
    plt.scatter(X_kpca[reds, 0], X_kpca[reds, 1], c="red",
                s=20, edgecolor='k')
    plt.scatter(X_kpca[blues, 0], X_kpca[blues, 1], c="blue",
                s=20, edgecolor='k')

With the defaults args, the example is reproduced exactly and the separation is very clear:

kpca_demo()

original

However, small changes to any of these args reveal that this nice result is not at all typical, and you are more likely to find a latent space with a similar nonlinearity to the original data:

kpca_demo(seed=2) # was seed=0

change_seed

kpca_demo(factor=.31) # was factor=.3

change_factor

kpca_demo(gamma=10.2) # was gamma=10

change_gamma

To improve the pedagogical value of this example, I suggest either finding a more robust demonstration of kernel PCA or else commenting that some fine tuning is generally required to achieve linear separation.

jnothman commented 6 years ago

In the first instance, a PR with that comment, and perhaps an illustration of brittleness to parameters, would be welcome

dkirkby commented 6 years ago

I can submit a PR to update the existing example with document of its inherent sensitivity to small changes in the data and hyperparams. A more robust example would be even better, though, if anyone has ideas.

lesteve commented 6 years ago

Quickly playing with the example and tweaking the parameters, it looks like a smaller factor (0.25 for examle) makes the example more robust.

scikit-learn / scikit-learn

Misleading kernel PCA example #10530