Remove Explicit Parameter Tracking

rizar commented 9 years ago

In Groundhog currently every layer has self.params: list of parameters its output depends on. As Jan thoughtfully pointed out about a month ago, it is not necessary since they can all be all retrieved by traversing computation graph. Then self.params_grad_scale elements should be attached to the parameters, which could be probably done by subclassing shared variable class (the problem is not quite clear what to subclass...).

janchorowski commented 9 years ago

All Theano expressions (shared variables and regular expressions) have a tag attribute to which we can add such information.

Another option I am testing right now is to decouple the computation, from optimization tricks (gradient scaling) and regularization (weight decay, column norms). Since parameters have unique and often meaningful names, it is easy to write regexps or something similar to set rules such as: all weights of layer X are decayed by...

What do you think?

rizar commented 9 years ago

In general I like the idea. However, a question arises where should all this information (gradient scaling constants, weight decay constants, etc.) be stored. I still think that layers are good candidates for that.

We could do it like that:

every annotated variable keeps a reference to the layer whose output it is, like theano variables refer to Apply nodes
we provide the user with simple recursive function that scans the computation graphs and returns all layers used in it
user can do whatever he wants to select layers and apply modification to them

For instance (GH stands for groundhog):

x = TT.matrix('x') h1 = GH.FeedForwardLayer(nin=784, nout=500, ...., name="layer1")(x) h2 = GH.FeedForwardLayer(nin=50, nout=10, ..., name="layer2")(h1) probs = GH.SoftmaxLayer(..., name="softmax")(h2) ... softmax, = filter(lambda x : x.name == "softmax", GH.get_layers(probs)) softmax.weight_decay_coof = 0.001

lisa-groundhog / GroundHog

Remove Explicit Parameter Tracking #11