rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.23k stars 532 forks source link

[FEA] TSNE - Kullback Leiback Divergence for early stopping #863

Open danielhanchen opened 5 years ago

danielhanchen commented 5 years ago

Currently only the gradient norm is used for early stopping. However, one can calculate the actual Kullback Leiback divergence during the gradient updates for further diagnosis of whether TSNE has reached a stable configuration.

cjnolet commented 5 years ago

Will the new KLDivergence prim help you here?

danielhanchen commented 5 years ago

Oh so in the naive kernel I already incrementally updated the KLD loss. I just didn't put it in the Barnes Hut version yet. But I'll check the new prim out.

Most likely I'll continue using the naive TSNE version since no repeat calculations are made, and just P*log(P/Q) is used. Ie I have P, Q but they're not in a vector, but discarded during the algorithm. So I need an incremental version.

teju85 commented 5 years ago

What do you mean by incremental KLD version?

danielhanchen commented 5 years ago

@teju85 So the KL prim is KL_Prim(P, Q) and KL = sum(P * log(P / Q)) is done.

In TSNE, P and Q are not formed explicity as a vector, they're formed on the go. Hence the KL divergence im using will incrementally add to KL.

teju85 commented 5 years ago

got it. Makes sense now. Thanks @danielhanchen for explaining.

minor point: you can reduce a couple of lines of code in yours if you decide to use the KLDOp defined here

danielhanchen commented 5 years ago

Yep soz for the delay! I'll defs use that :)

drobison00 commented 4 years ago

@danielhanchen Do you know the current state of this work?

danielhanchen commented 4 years ago

@drobison00 I haven't been able to get around to this sadly. I did however in https://github.com/danielhanchen/tsne/blob/master/TSNE%20Extended%20Notebook.ipynb, specify some equations which might be helpful.

The main aim is to find the sum for KL for each point, which can be done inside the attractive forces kernel. So, some manipulation of formulas will have to be done in order to get sum KL_ij.