Closed luochuwei closed 8 years ago
Hi @luochuwei,
We carry the gradients into shared variables just for debugging purposes (you can easily get their values, and plot the norm etc.).
The downside is increased memory consumption (note that, this is a tutorial :wink: ). For large scale experiments, you may want to skip this step to save some memory, in that case you can check here for a reference implementation.
@orhanf Oh, I know. Thank you very much!
We find in theano tutorial, the update in the theano function uses grads directly. However, in the "sgd" function in your code, the grads are put into gshared, could you tell me the reason? Thank you!