Open dwf opened 8 years ago
This is indeed a situation that we have not foreseen. In theory, we support multiple parents, but as it turns out there might be side effects.
Even if allocation would not be done automatically in apply
, I feel that calling a.allocate()
and then b.allocate()
and letting sub_mlp
parameters change in between these calls is somewhat dangerous.
We do have self.allocated
and self.initialized
flags which reflect the status of the brick. Seems like we disable reallocation if self.allocated
is True
, and the same for initialization. What do you think?
Initialization seems trickier since I can imagine wanting to explicitly reinitialize. Are there similar circumstances for allocation?
Might be worth making those changes, one at a time, and running the tests just to see what happens.
Sure, I will try both changes and see if they break things no further then tomorrow morning.
What about having a force
keyword argument for both allocate()
and
initalize()
that would make the bricks ignore their allocated
and
initialized
flags?
On 11 January 2016 at 15:04, David Warde-Farley notifications@github.com wrote:
Initialization seems trickier since I can imagine wanting to explicitly reinitialize. Are there similar circumstances for allocation?
Might be worth making those changes, one at a time, and running the tests just to see what happens.
— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/935#issuecomment-170673344.
Yes, something like that sounds like a good idea. I vaguely remember
@bartvm objecting to such a keyword for push_initialization_config
,
though, so I'd be curious to hear his thoughts.
I think we have to be careful to clearly define the semantics of this flag though (and the "allocated" and "initialized" flags). For example,
self
but not children? (The other direction is sort of easy, just call
the relevant method on the child directly.) I can see potential use-cases
for this, but maybe it's too niche.That's a great wish list. I will try to propose a solution tomorrow that will include (a) changes to the code, that should amount to just a few lines (b) clear explanation in the modification of the flags added.
On 11 January 2016 at 15:41, David Warde-Farley notifications@github.com wrote:
Yes, something like that sounds like a good idea. I vaguely remember @bartvm objecting to such a keyword for
push_initialization_config
, though, so I'd be curious to hear his thoughts.I think we have to be careful to clearly define the semantics of this flag though (and the "allocated" and "initialized" flags). For example,
- Is it assumed that if self.initialized/self.allocated is True, then everything below self in the brick hierarchy is initialized/allocated?
- What is the default behaviour if the status for self is False but a child has status True? Presumably, do not allocate/initialize that child, but we should be explicit.
- What is the behaviour in the above situation if "force" is specified?
- Do we want to provide a flag/mechanism for reallocating/reinitializing
self
but not children? (The other direction is sort of easy, just call the relevant method on the child directly.) I can see potential use-cases for this, but maybe it's too niche.— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/935#issuecomment-170683544.
Sorry, booked for today to implement an adapter between Blocks and Platoon, but this issue is on the very top of my TODO list.
Consider this example:
The two lists of ids will be different, because upon calling
b.apply(...)
,b
sees that it is not allocated and allocates itself and its children, thus blowing away the allocation thata
did. Now the parameters ofsub_mlp
from the first allocation are in the computational graph ofy
but an entirely different set of parameters are in the computational graph ofz
. Notably, both sets of parameters say they belong tosub_mlp
in their metadata.This is a pretty ugly situation. I notice that the
parents
logic in Brick does suppose that a brick can have multiple parents, but the allocation situation in this scenario makes that borderline unusable. This could be ameliorated in a couple of (complementary) ways.apply(...)
. This would have at least made the problem apparent to me a lot more quickly.allocate()
) . Not sure how viable this is, but I think it respects the "brick life cycle" outlined in the docs.A work-around in this particular instance is to set
b.allocated
toTrue
(andb.parameters to []
, if you want to satisfy assumptions elsewhere in the library e.g. you want the Selector to work) but this isn't a very general solution. Better control over (and documentation of) the tendency to obliterate existing child allocations seems in order, in any case.