mila-iqia / blocks

A Theano framework for building and training neural networks
Other
1.16k stars 351 forks source link

Multiple parents blow away child bricks' existing allocation #935

Open dwf opened 8 years ago

dwf commented 8 years ago

Consider this example:

from blocks.bricks import Rectifier, MLP, Sequence, Logistic
from blocks.select import Selector
from theano import tensor
from theano.printing import debugprint

sub_mlp = MLP([Rectifier(), Rectifier()], [5, 4, 3])
a = Sequence([sub_mlp.apply, Logistic().apply])
b = Sequence([sub_mlp.apply, Logistic().apply])
x = tensor.matrix()
y = a.apply(x)
ids_pre = sum([[id(x) for x in lin.parameters] for lin in sub_mlp.linear_transformations], [])
print(ids_pre)
z = b.apply(x)
ids_post = sum([[id(x) for x in lin.parameters] for lin in sub_mlp.linear_transformations], [])
print(ids_post)

The two lists of ids will be different, because upon calling b.apply(...), b sees that it is not allocated and allocates itself and its children, thus blowing away the allocation that a did. Now the parameters of sub_mlp from the first allocation are in the computational graph of y but an entirely different set of parameters are in the computational graph of z. Notably, both sets of parameters say they belong to sub_mlp in their metadata.

This is a pretty ugly situation. I notice that the parents logic in Brick does suppose that a brick can have multiple parents, but the allocation situation in this scenario makes that borderline unusable. This could be ameliorated in a couple of (complementary) ways.

A work-around in this particular instance is to set b.allocated to True (and b.parameters to [], if you want to satisfy assumptions elsewhere in the library e.g. you want the Selector to work) but this isn't a very general solution. Better control over (and documentation of) the tendency to obliterate existing child allocations seems in order, in any case.

rizar commented 8 years ago

This is indeed a situation that we have not foreseen. In theory, we support multiple parents, but as it turns out there might be side effects.

Even if allocation would not be done automatically in apply, I feel that calling a.allocate() and then b.allocate() and letting sub_mlp parameters change in between these calls is somewhat dangerous.

We do have self.allocated and self.initialized flags which reflect the status of the brick. Seems like we disable reallocation if self.allocated is True, and the same for initialization. What do you think?

dwf commented 8 years ago

Initialization seems trickier since I can imagine wanting to explicitly reinitialize. Are there similar circumstances for allocation?

Might be worth making those changes, one at a time, and running the tests just to see what happens.

rizar commented 8 years ago

Sure, I will try both changes and see if they break things no further then tomorrow morning.

What about having a force keyword argument for both allocate() and initalize() that would make the bricks ignore their allocated and initialized flags?

On 11 January 2016 at 15:04, David Warde-Farley notifications@github.com wrote:

Initialization seems trickier since I can imagine wanting to explicitly reinitialize. Are there similar circumstances for allocation?

Might be worth making those changes, one at a time, and running the tests just to see what happens.

— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/935#issuecomment-170673344.

dwf commented 8 years ago

Yes, something like that sounds like a good idea. I vaguely remember @bartvm objecting to such a keyword for push_initialization_config, though, so I'd be curious to hear his thoughts.

I think we have to be careful to clearly define the semantics of this flag though (and the "allocated" and "initialized" flags). For example,

rizar commented 8 years ago

That's a great wish list. I will try to propose a solution tomorrow that will include (a) changes to the code, that should amount to just a few lines (b) clear explanation in the modification of the flags added.

On 11 January 2016 at 15:41, David Warde-Farley notifications@github.com wrote:

Yes, something like that sounds like a good idea. I vaguely remember @bartvm objecting to such a keyword for push_initialization_config, though, so I'd be curious to hear his thoughts.

I think we have to be careful to clearly define the semantics of this flag though (and the "allocated" and "initialized" flags). For example,

  • Is it assumed that if self.initialized/self.allocated is True, then everything below self in the brick hierarchy is initialized/allocated?
  • What is the default behaviour if the status for self is False but a child has status True? Presumably, do not allocate/initialize that child, but we should be explicit.
  • What is the behaviour in the above situation if "force" is specified?
  • Do we want to provide a flag/mechanism for reallocating/reinitializing self but not children? (The other direction is sort of easy, just call the relevant method on the child directly.) I can see potential use-cases for this, but maybe it's too niche.

— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/935#issuecomment-170683544.

rizar commented 8 years ago

Sorry, booked for today to implement an adapter between Blocks and Platoon, but this issue is on the very top of my TODO list.