mila-iqia / blocks

A Theano framework for building and training neural networks
Other
1.16k stars 351 forks source link

Bricks or bound applications as arguments? #406

Open bartvm opened 9 years ago

bartvm commented 9 years ago

I think we need to choose one of the two and stick with it, because right now different bricks do different things, and it'll come to haunt us eventually...

Bricks that use other bricks have two options: They take a brick as an input, and assume the application that they need is called apply. This clearly has the downside that it's impossible to use applications that are not called apply. The advantage is that it looks nicer to say MLP(activations=[Rectifier(), Tanh()]) than MLP(activations[Rectifier().apply, Tanh().apply]).

There are 2 options I see:

  1. Switch to consistent use of bound applications as arguments. Occasionally allowing for both bricks and bound applications for the sake of usability (e.g. for MLP).
  2. Accept both in all cases by using a utility function that does something like this, and use it in every brick.
def get_applications(*bricks_or_applications):
    for i in range(len(bricks_or_applications)):
        if isinstance(bricks_or_applications[i], Brick):
             bricks_or_applications[i] = bricks_or_applications[i].apply
        elif not isinstance(bricks_or_applications, BoundApplication):
             raise ValueError
    return bricks_or_applications

My vote goes to (1).

rizar commented 9 years ago

We will not be able to always use application methods because some argument bricks might have several application methods (sequence_generators.py, attention.py). So I guess your question is about generic bricks, from which only MLP accepts bricks now.

I should say that MLP([Tanh(), Tanh()], [10]) looks significantly simpler than MLP([Tanh().apply, Tanh().apply], [10]). On other hand, the latter will from the beginning destroy the delusion that Bricks have only one application method.

When it comes to voting, I choose (2).

bartvm commented 9 years ago

Right, in some cases a brick is given as an argument and it's expected to have a certain interface, but I'm talking about the situations where we just take bricks as activations. This is not just MLP, it's also the SimpleRecurrent, LSTM and GatedRecurrent bricks for example.

On Wednesday, March 4, 2015, Dmitry Bogdanov <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

We will not be able to always use application methods because some argument bricks might have several application methods ( sequence_generators.py, attention.py). So I guess your question is about generic bricks, from which only MLP accepts bricks now.

I should say that MLP([Tanh(), Tanh()], [10]) looks significantly simpler than MLP([Tanh().apply, Tanh().apply], [10]). On other hand, the latter will from the beginning destroy the delusion that Bricks have only one application method.

When it comes to voting, I choose (2).

— Reply to this email directly or view it on GitHub https://github.com/bartvm/blocks/issues/406#issuecomment-77147196.

pbrakel commented 9 years ago

Perhaps it would be an option to go for bound methods but allow bricks to be callable such that a default application can be set during initialization and Tanh() and Tanh().apply would have the same result when called while the option to use other applications is still there.

edit: I didn't actually check whether bricks are callable so this might not be possible in that case.

bartvm commented 9 years ago

At first sight, I quite like this idea. It could solve the issue of having to check everywhere whether a brick or application was passed, and mlp(x) is shorter than mlp.apply(x). It could save a lot of useless .applys for all those bricks that only have one method.

rizar commented 9 years ago

We could just have def __call__(self, *args, **kwargs): return self.apply(*args, **kwargs) as an official shortcut in Brick. But it would not fully solve the problem, because the Tanh() syntax would still build a brick, and it would have to be handled differently from Tanh().__call__.

bartvm commented 9 years ago

How so? In many cases they would be interchangeable, as long as the only thing you do is call them e.g. for an MLP) since this works then:

x = tensor.vector('x')
application_methods = [Tanh(), Softmax().cost]
return [application_method(x) for application_method in application_methods]
rizar commented 9 years ago

When you pass Tanh().apply as an argument to your brick, somewhere in the brick there should be application_method.brick request in order to set the children attribute correctly.

bartvm commented 9 years ago

If setting children is the only issue, it's worth considering intercepting assignments to self.children (like we do for self.parameters) and extract application_method.brick there. This way self.children[0] = brick and self.children[1] = application_method both work, instead of every single brick needing to re-implement the logic of checking which one it is.

rizar commented 9 years ago

Two months have passed, and now I am also more inclined to pass an application method as a parameter in cases when only one application method of the passed brick is used. The hack with children you propose also sounds not bad. Together with renaming apply into __call__ I think these modifications will solve the issue.