keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.15k stars 19.49k forks source link

BUG: Layer.get_input returns the wrong input for shared nodes #1254

Closed around1991 closed 7 years ago

around1991 commented 8 years ago

Hi all

I'm trying to build a network that takes in two sentences, turns them into fixed-length representations using a shared feed-forward NN, and then calculates their dot product. Unfortunately, I get an error when I try this.

My code for this is

from keras.models import Graph, Sequential
from keras.layers.embeddings import Embedding
from keras.layers.core import Dense, Flatten

inner_model = Sequential()

# This is the global word lookup table
embedding = Embedding(input_dim=400, output_dim=300, input_length=4)

inner_model.add(embedding)
inner_model.add(Flatten())
inner_model.add(Dense(output_dim=600))
inner_model.add(Dense(output_dim=300))

model = Graph()
model.add_input(name='left', input_shape=(4,), dtype='int')
model.add_input(name='right', input_shape=(4,), dtype='int')
model.add_shared_node(layer=inner_model,
                      name='arg_comp',
                      inputs=['left', 'right'],
                      merge_mode='dot', dot_axes=((0,), (0,)),
                      create_output=True)

model.compile(loss={'arg_comp': 'binary_crossentropy'}, optimizer='sgd')

However, I get the following exception:

Traceback (most recent call last):
  File "src/models/test_arg_comp.py", line 25, in <module>
    model.compile(loss={'arg_comp': 'binary_crossentropy'}, optimizer='sgd')
  File "build/bdist.linux-x86_64/egg/keras/models.py", line 635, in compile
  File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 361, in function
  File "build/bdist.linux-x86_64/egg/keras/backend/theano_backend.py", line 354, in __init__
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 266, in function
    profile=profile)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 511, in pfunc
    on_unused_input=on_unused_input)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 1465, in orig_function
    on_unused_input=on_unused_input).create(
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 1124, in __init__
    self._check_unused_inputs(inputs, outputs, on_unused_input)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 1244, in _check_unused_inputs
    raise UnusedInputError(msg % (inputs.index(i), i.variable, err_msg))
theano.compile.function_module.UnusedInputError: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 1 is not part of the computational graph needed to compute the outputs: right.
To make this error into a warning, you can pass the parameter on_unused_input='warn' to theano.function. To disable it completely, use on_unused_input='ignore'.

Is this a bug in the Siamese layer? Or am I doing something wrong?

Thanks Kris

around1991 commented 8 years ago

Strangely enough, the following code works:

inner_model = Sequential()
inner_model.add(Dense(input_dim=1, output_dim=2))
model = Graph()
model.add_input(name='l', input_shape=(1,))
model.add_input(name='r', input_shape=(1,))
model.add_shared_node(name='comp', layer=inner_model, inputs=['l', 'r'], merge_mode='dot', dot_axes=[[1], [1]], create_output=True)
model.compile(loss={'comp': 'mse'}, optimizer='sgd')

but the following doesn't:

inner_model = Sequential()
inner_model.add(Dense(input_dim=1, output_dim=2))
inner_model.add(Flatten())
model = Graph()
model.add_input(name='l', input_shape=(1,))
model.add_input(name='r', input_shape=(1,))
model.add_shared_node(name='comp', layer=inner_model, inputs=['l', 'r'], merge_mode='dot', dot_axes=[[1], [1]], create_output=True)
model.compile(loss={'comp': 'mse'}, optimizer='sgd')

Is there anything the Flatten layer could be doing that screws everything up?

around1991 commented 8 years ago

I think I've found the cause of the bug.

In the get_input() method of the base Layer class, we cache the output of the previous layer, and if we're asked to get the input again, we return the cached value. The hash value used as the cache key is the ID of the previous layer, as well as whether train is True or False.

However, if the previous layer's output can change dynamically (such as if it has two separate inputs, which are dynamically set), this is not reflected in the hash value, and get_input fetches a stale output.

I think the best way to fix this is to change how previous layers are hashed to avoid hash collisions like this again. So, instead of

previous_layer_id = '%s_%s' % (id(self.previous), train) 

replace this with

first_layer = self
while hasattr(first_layer.previous):
    first_layer = first_layer.previous
previous_layer_id = '%s_%s' % (id(first_layer), train)

@farizrahman4u does this sound OK to you?

farizrahman4u commented 8 years ago

@around1991 The Siamese layer was introduced before some serious API changes and the caching stuff. Now that we have __call__ for layer, I will be updating the Siamese layer to use it, and this issue will disappear with it.

around1991 commented 8 years ago

@farizrahman4u thanks for the update, but I don't think that fixes the underlying cause of the bug. Layer.get_input will still be bugged, and any use of it could potentially introduce unwanted behaviour. I still think it should be fixed. I've got some diffs which fix it, but they cause a bunch of unit tests to fail, so I'm currently looking to patch them and then submit a pull request.

farizrahman4u commented 8 years ago

@around1991 When you use __call__, the get_input of the layer is temporarily replaced with a lambda function, so no caching.

around1991 commented 8 years ago

@farizrahman4u however, on a multi-layer container (which is where the bug originated in the first place), the __call__ method only replaces get_input on the first layer of the container. When get_output of the last layer of the container is called within the __call__ method, it will still call get_input on all layers but the first, and the layer caching bug will still cause issues.

I think layer output caching will still need to be fixed, even when the Siamese layer changes to use __call__. I apologise for tagging you, as I thought the issue was with Siamese to begin with. Do you know who wrote the layer caching code?

farizrahman4u commented 8 years ago

You are right. i think its @fchollet.

farizrahman4u commented 8 years ago

1271 Fixes all issues related to Siamese layer. Including layer caching.

around1991 commented 8 years ago

Hmm, I'm not convinced that just turning off layer caching is the right way to go. Presumably @fchollet added it for a reason? I do think the fix is (relatively) straightforward: cache inputs based on the original source of the input data, not the proximate source. That way, different input values are still cached, and we can retrieve them easily enough.

farizrahman4u commented 8 years ago

@around1991 The layer cache is turned of only temporarily during __call__ is run. Everything is reset back later. The reason layer caching is done is to make sure that the tensors are not built multiple times in complex graphs, where a layer might be used more the ones. It is only an optimization. Caching is almost irrelevant in the context of __call__ (it has got nothing to do with graphs).

To know more about layer caching, go through #666.