joschu / cgt

Computation Graph Toolkit
Other
628 stars 87 forks source link

This pull request adds dim_shuffle to cgt.api #3

Closed bstadie closed 9 years ago

bstadie commented 9 years ago

This pull request adds dim_shuffle to cgt.api and the corresponding shortcut to cgt.core

benanne commented 9 years ago

In Theano this operation is referred to as dimshuffle is without an underscore, so it would be nice if CGT followed that convention :)

bstadie commented 9 years ago

Got it.

joschu commented 9 years ago

Looks great! Could you please:

(1) add docstrings to optimization functions -- paper reference + equation (2) add docstrings to initializers, via He.__doc__ = "blah" with paper reference + equation (3) replace Zeros()->Constant(0) in cgt_theano_feedforward_comparison.py (4) add a unit test for the optimizers, to make sure we don't break it. (especially as we're refactoring shared(...) and so forth). I'm not sure what's the best test -- one possibility is to run a few steps of gradient descent on

joschu commented 9 years ago

Regarding the unit test, please create a script cgt/tests/test_optimizers.py. Then when you do nosetests -v you can verify that your tests were run. (note that the test functions need to be called test_<something>.py so nose can find them.

hojonathanho commented 9 years ago

Schaul, Antonoglou, and Silver put together a suite of tests for stochastic optimization: see http://arxiv.org/abs/1312.6055 and https://github.com/IoannisAntonoglou/optimBench The functions they test on could serve as inspiration for CGT's unit tests, and maybe in the future the whole benchmark could be used.

zxie commented 9 years ago

Nice! Adam also comes to mind as an optimizer that has recently become popular. We should refactor the code in examples with these.

Maybe too early to worry about this, but think worth laying out at some point how much neural net functionality to include in main CGT library (as nn.py and nn_ops grow) versus at some point breaking into separate libraries as with Pylearn2 or Blocks/Fuel (seems like nnbuilder already does this to some extent?).

bstadie commented 9 years ago

@zxie I tried to code adam but it failed and I figured I'd save it for later. You're welcome to try and debug my code at some point if you'd like :p

@joschu and I had a brief discussion about this. Basically, with shape inference and nn.get_params() it seems like an nnbuilder is not really needed. Since for instance you can define a dense layer with only

activation(nn.Affine(feature_dims, num_units, weight_init=w_init, bias_init=bias_init)(nn_input))

It makes sense to add some nice constructors to nn such as

def denseLayer(nn_input, num_units, activation=nn.rectify, w_init=nn.IIDGaussian(0, 0.1), bias_init=nn.Zeros()):
    if len(nn_input.shape) > 2:
        nn_input = nn_input.reshape([nn_input.shape[0], nn_input.shape[1:]])
    feature_dims = cgt.infer_shape(nn_input)[1]
    return activation(nn.Affine(feature_dims, num_units, weight_init=w_init, bias_init=bias_init)(nn_input))

so that a user could build up their network like this:

d1 = layers.denseLayer(X, 256, nn.rectify)
d2 = layers.dropout(d1, p_drop)
d3 = layers.denseLayer(d2, 256, nn.rectify)
d4 = layers.dropout(d3, p_drop)
d5 = layers.dense(d4, 10, nn.softmax)

and save the trouble of doing shape inference every time. I also wrote modules similar to Affine/SpatialConv for RNNs/LSTM/GRU which could be similarly wrapped in nice constructors. I think this adds minimal overhead to cgt.nn while providing an extremely clean way to write networks. What other features were you thinking cgt.nn/a seperate nn library should include?

skaae commented 9 years ago

Lasagne has some unit tests for optimizer that you can probably copy?

https://github.com/Lasagne/Lasagne/blob/master/lasagne/tests/test_updates.py

bstadie commented 9 years ago

@skaae one step ahead of you :)

joschu commented 9 years ago

Nice work! I made a couple changes to your branch and then pushed.