Question about SGD method used

pavlin-policar / openTSNE

Extensible, parallel implementations of t-SNE

https://opentsne.rtfd.io

BSD 3-Clause "New" or "Revised" License

1.45k stars 160 forks source link

Question about SGD method used #241

Closed rpadmanabhan closed 1 year ago

rpadmanabhan commented 1 year ago

Firstly, thanks for creating and maintaining such a high quality open source implementation of t-SNE. It is much appreciated.

I just had a clarifying question. The docstring here says "batch gradient descent". But reading through the code it appears to be an "iterative" (1 at a time gradient update) gradient descent. Is that correct ?

Thank you

pavlin-policar commented 1 year ago

Thanks! I'm glad you find openTSNE useful.

Batch gradient descent here means that we calculate the gradient (update) using all the data points, as opposed to stochastic gradient descent (SGD), where we estimate the gradient on one or a batch of data points. We're still use gradient descent, so we repeatedly calculate the gradients and update the embedding until convergence, hence the iteration.

rpadmanabhan commented 1 year ago

Okay, thank you. I just wanted to clarify that there was no notion of "epoch" or going through a batch of data points. I guess as opposed to optimization for classical machine learning where the weights/parameter (that are being optimized) remain the same across data points (so you can have batches of data points), in the optimization here we must use the whole data because the entire data in lower dimensional space are the parameters we are trying to optimize.