Closed around1991 closed 7 years ago
Strangely enough, the following code works:
inner_model = Sequential()
inner_model.add(Dense(input_dim=1, output_dim=2))
model = Graph()
model.add_input(name='l', input_shape=(1,))
model.add_input(name='r', input_shape=(1,))
model.add_shared_node(name='comp', layer=inner_model, inputs=['l', 'r'], merge_mode='dot', dot_axes=[[1], [1]], create_output=True)
model.compile(loss={'comp': 'mse'}, optimizer='sgd')
but the following doesn't:
inner_model = Sequential()
inner_model.add(Dense(input_dim=1, output_dim=2))
inner_model.add(Flatten())
model = Graph()
model.add_input(name='l', input_shape=(1,))
model.add_input(name='r', input_shape=(1,))
model.add_shared_node(name='comp', layer=inner_model, inputs=['l', 'r'], merge_mode='dot', dot_axes=[[1], [1]], create_output=True)
model.compile(loss={'comp': 'mse'}, optimizer='sgd')
Is there anything the Flatten layer could be doing that screws everything up?
I think I've found the cause of the bug.
In the get_input()
method of the base Layer
class, we cache the output of the previous layer, and if we're asked to get the input again, we return the cached value. The hash value used as the cache key is the ID of the previous layer, as well as whether train is True or False.
However, if the previous layer's output can change dynamically (such as if it has two separate inputs, which are dynamically set), this is not reflected in the hash value, and get_input fetches a stale output.
I think the best way to fix this is to change how previous layers are hashed to avoid hash collisions like this again. So, instead of
previous_layer_id = '%s_%s' % (id(self.previous), train)
replace this with
first_layer = self
while hasattr(first_layer.previous):
first_layer = first_layer.previous
previous_layer_id = '%s_%s' % (id(first_layer), train)
@farizrahman4u does this sound OK to you?
@around1991 The Siamese layer was introduced before some serious API changes and the caching stuff. Now that we have __call__
for layer, I will be updating the Siamese layer to use it, and this issue will disappear with it.
@farizrahman4u thanks for the update, but I don't think that fixes the underlying cause of the bug. Layer.get_input
will still be bugged, and any use of it could potentially introduce unwanted behaviour. I still think it should be fixed. I've got some diffs which fix it, but they cause a bunch of unit tests to fail, so I'm currently looking to patch them and then submit a pull request.
@around1991 When you use __call__
, the get_input
of the layer is temporarily replaced with a lambda function, so no caching.
@farizrahman4u however, on a multi-layer container (which is where the bug originated in the first place), the __call__
method only replaces get_input
on the first layer of the container. When get_output
of the last layer of the container is called within the __call__
method, it will still call get_input
on all layers but the first, and the layer caching bug will still cause issues.
I think layer output caching will still need to be fixed, even when the Siamese layer changes to use __call__
. I apologise for tagging you, as I thought the issue was with Siamese to begin with. Do you know who wrote the layer caching code?
You are right. i think its @fchollet.
Siamese
layer. Including layer caching.Hmm, I'm not convinced that just turning off layer caching is the right way to go. Presumably @fchollet added it for a reason? I do think the fix is (relatively) straightforward: cache inputs based on the original source of the input data, not the proximate source. That way, different input values are still cached, and we can retrieve them easily enough.
@around1991 The layer cache is turned of only temporarily during __call__
is run. Everything is reset back later. The reason layer caching is done is to make sure that the tensors are not built multiple times in complex graphs, where a layer might be used more the ones. It is only an optimization. Caching is almost irrelevant in the context of __call__
(it has got nothing to do with graphs).
To know more about layer caching, go through #666.
Hi all
I'm trying to build a network that takes in two sentences, turns them into fixed-length representations using a shared feed-forward NN, and then calculates their dot product. Unfortunately, I get an error when I try this.
My code for this is
However, I get the following exception:
Is this a bug in the Siamese layer? Or am I doing something wrong?
Thanks Kris