Open danielhanchen opened 5 years ago
Will the new KLDivergence prim help you here?
Oh so in the naive kernel I already incrementally updated the KLD loss. I just didn't put it in the Barnes Hut version yet. But I'll check the new prim out.
Most likely I'll continue using the naive TSNE version since no repeat calculations are made, and just P*log(P/Q) is used. Ie I have P, Q but they're not in a vector, but discarded during the algorithm. So I need an incremental version.
What do you mean by incremental KLD version?
@teju85 So the KL prim is KL_Prim(P, Q) and KL = sum(P * log(P / Q)) is done.
In TSNE, P and Q are not formed explicity as a vector, they're formed on the go. Hence the KL divergence im using will incrementally add to KL.
got it. Makes sense now. Thanks @danielhanchen for explaining.
minor point: you can reduce a couple of lines of code in yours if you decide to use the KLDOp defined here
Yep soz for the delay! I'll defs use that :)
@danielhanchen Do you know the current state of this work?
@drobison00 I haven't been able to get around to this sadly. I did however in https://github.com/danielhanchen/tsne/blob/master/TSNE%20Extended%20Notebook.ipynb, specify some equations which might be helpful.
The main aim is to find the sum for KL for each point, which can be done inside the attractive forces kernel. So, some manipulation of formulas will have to be done in order to get sum KL_ij.
Currently only the gradient norm is used for early stopping. However, one can calculate the actual Kullback Leiback divergence during the gradient updates for further diagnosis of whether TSNE has reached a stable configuration.