Open bartvm opened 9 years ago
We will not be able to always use application methods because some argument bricks might have several application methods (sequence_generators.py
, attention.py
). So I guess your question is about generic bricks, from which only MLP
accepts bricks now.
I should say that MLP([Tanh(), Tanh()], [10])
looks significantly simpler than
MLP([Tanh().apply, Tanh().apply], [10])
. On other hand, the latter will from the beginning destroy the delusion that Bricks
have only one application method.
When it comes to voting, I choose (2).
Right, in some cases a brick is given as an argument and it's expected to have a certain interface, but I'm talking about the situations where we just take bricks as activations. This is not just MLP, it's also the SimpleRecurrent, LSTM and GatedRecurrent bricks for example.
On Wednesday, March 4, 2015, Dmitry Bogdanov <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:
We will not be able to always use application methods because some argument bricks might have several application methods ( sequence_generators.py, attention.py). So I guess your question is about generic bricks, from which only MLP accepts bricks now.
I should say that MLP([Tanh(), Tanh()], [10]) looks significantly simpler than MLP([Tanh().apply, Tanh().apply], [10]). On other hand, the latter will from the beginning destroy the delusion that Bricks have only one application method.
When it comes to voting, I choose (2).
— Reply to this email directly or view it on GitHub https://github.com/bartvm/blocks/issues/406#issuecomment-77147196.
Perhaps it would be an option to go for bound methods but allow bricks to be callable such that a default application can be set during initialization and Tanh()
and Tanh().apply
would have the same result when called while the option to use other applications is still there.
edit: I didn't actually check whether bricks are callable so this might not be possible in that case.
At first sight, I quite like this idea. It could solve the issue of having to check everywhere whether a brick or application was passed, and mlp(x)
is shorter than mlp.apply(x)
. It could save a lot of useless .apply
s for all those bricks that only have one method.
We could just have def __call__(self, *args, **kwargs): return self.apply(*args, **kwargs)
as an official shortcut in Brick
. But it would not fully solve the problem, because the Tanh()
syntax would still build a brick, and it would have to be handled differently from Tanh().__call__
.
How so? In many cases they would be interchangeable, as long as the only thing you do is call them e.g. for an MLP) since this works then:
x = tensor.vector('x')
application_methods = [Tanh(), Softmax().cost]
return [application_method(x) for application_method in application_methods]
When you pass Tanh().apply
as an argument to your brick, somewhere in the brick there should be application_method.brick
request in order to set the children
attribute correctly.
If setting children is the only issue, it's worth considering intercepting assignments to self.children
(like we do for self.parameters
) and extract application_method.brick
there. This way self.children[0] = brick
and self.children[1] = application_method
both work, instead of every single brick needing to re-implement the logic of checking which one it is.
Two months have passed, and now I am also more inclined to pass an application method as a parameter in cases when only one application method of the passed brick is used. The hack with children
you propose also sounds not bad. Together with renaming apply
into __call__
I think these modifications will solve the issue.
I think we need to choose one of the two and stick with it, because right now different bricks do different things, and it'll come to haunt us eventually...
Bricks that use other bricks have two options: They take a brick as an input, and assume the application that they need is called
apply
. This clearly has the downside that it's impossible to use applications that are not calledapply
. The advantage is that it looks nicer to sayMLP(activations=[Rectifier(), Tanh()])
thanMLP(activations[Rectifier().apply, Tanh().apply])
.There are 2 options I see:
MLP
).My vote goes to (1).