lisa-groundhog / GroundHog

Library for implementing RNNs with Theano
BSD 3-Clause "New" or "Revised" License
598 stars 229 forks source link

Remove Explicit Parameter Tracking #11

Open rizar opened 9 years ago

rizar commented 9 years ago

In Groundhog currently every layer has self.params: list of parameters its output depends on. As Jan thoughtfully pointed out about a month ago, it is not necessary since they can all be all retrieved by traversing computation graph. Then self.params_grad_scale elements should be attached to the parameters, which could be probably done by subclassing shared variable class (the problem is not quite clear what to subclass...).

janchorowski commented 9 years ago

All Theano expressions (shared variables and regular expressions) have a tag attribute to which we can add such information.

Another option I am testing right now is to decouple the computation, from optimization tricks (gradient scaling) and regularization (weight decay, column norms). Since parameters have unique and often meaningful names, it is easy to write regexps or something similar to set rules such as: all weights of layer X are decayed by...

What do you think?

rizar commented 9 years ago

In general I like the idea. However, a question arises where should all this information (gradient scaling constants, weight decay constants, etc.) be stored. I still think that layers are good candidates for that.

We could do it like that:

For instance (GH stands for groundhog):

x = TT.matrix('x') h1 = GH.FeedForwardLayer(nin=784, nout=500, ...., name="layer1")(x) h2 = GH.FeedForwardLayer(nin=50, nout=10, ..., name="layer2")(h1) probs = GH.SoftmaxLayer(..., name="softmax")(h2) ... softmax, = filter(lambda x : x.name == "softmax", GH.get_layers(probs)) softmax.weight_decay_coof = 0.001