nyu-dl / dl4mt-tutorial

BSD 3-Clause "New" or "Revised" License
618 stars 249 forks source link

Why the grads have to be shared? #58

Closed luochuwei closed 8 years ago

luochuwei commented 8 years ago

We find in theano tutorial, the update in the theano function uses grads directly. However, in the "sgd" function in your code, the grads are put into gshared, could you tell me the reason? Thank you!

orhanf commented 8 years ago

Hi @luochuwei,

We carry the gradients into shared variables just for debugging purposes (you can easily get their values, and plot the norm etc.).

The downside is increased memory consumption (note that, this is a tutorial :wink: ). For large scale experiments, you may want to skip this step to save some memory, in that case you can check here for a reference implementation.

luochuwei commented 8 years ago

@orhanf Oh, I know. Thank you very much!