lisitsyn / tapkee

A flexible and efficient С++ template library for dimension reduction
http://tapkee.lisitsyn.me
BSD 3-Clause "New" or "Revised" License
231 stars 58 forks source link

corrected diffusion maps implementation #39

Closed jgarcke closed 8 years ago

jgarcke commented 8 years ago

The existing dm implementation is wrong in several places. This fixes the implementation following the original paper by Coifman and Lafon. Currently only alpha=1 (in the notation from that paper) is supported in the corrected code.

lisitsyn commented 8 years ago

Thanks for the fixes. I'd need some explanation what exactly was wrong though.

lisitsyn commented 8 years ago

Ok let me glance over the paper once again and then let's merge it.

jgarcke commented 8 years ago

I'll attach the relevant page from Lafon's Dissertation, there the algorithm is given more explicit than in the diffusion maps article, although missing the alpha (or using alpha=1, which is the preferred choice anyway) dm_alg.pdf

The alpha could come in step 3, where currently it is (incorrectly) timesteps, by adding another parameter to the interface.

What was wrong (besides timesteps/alpha) is using the first d eigenvectors, instead of the eigenvectors 2 to d+1, which are weighted by the first eigenvector, as seen in the attached pdf. And using the squared matrix in the eigenvalue computation.

Furthermore, the normalization by the lambda_i^t gives the actual diffusion coordinates, as presented in the Coifman/Lafon paper. Note that there are some slight errors in that paper, the continuous formulations/derivations are correct, but on the way to the discrete setup there are some inconsistencies between the given discrete computation and the continuous formulation.

lisitsyn commented 8 years ago

According to the paper you're quite right.

Here is how the code that my code was ported from (Matlab Toolbox for Dimensionality Reduction by Laurens van der Maaten) looks like:

    p = sum(K, 1)';
    K = K ./ ((p * p') .^ alpha);
    p = sqrt(sum(K, 1))';
    K = K ./ (p * p');

I'm curious if there is some idea behind that but a bit lazy to do the math :)

lisitsyn commented 8 years ago

It seems that it should be merged, thanks a lot @jgarcke

lisitsyn commented 8 years ago

Here is how swissroll looks like now:

pr39

jgarcke commented 8 years ago

Thanks.

As for K = K ./ ((p * p') .^ alpha);, that is correct, alpha is an additional parameter for Diffusion Maps, in addition to timesteps (which was there before in the code, but was confused with alpha).

One could add alpha as an additional parameter (with the default=1), but then it should also be added in the other places / examples / scripts, which I didn't want to tackle at this time.