Trying to define Cosine Similarity Layer Issues

Hello,

I am new to Theano and Keras so maybe I am asking something silly but have been a few days trying to get this working and I can not.

I want to build a siamese architecture that takes 2 vectors as inputs and outputs wether they come from the same source or not. The code I have (mostly from keras examples) is something like this:

#Create common structure.
def create_base_network(input_dim):
    seq = Sequential()
    eq.add...
    return seq

#network definition
base_network = create_base_network(input_dim)

input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))
processed_a = base_network(input_a)
processed_b = base_network(input_b)

The problem comes when I want to do a cosine_distance between processed_a and processed_b, this is what I tried:

def cosine_distance(vests):
    x, y = vests
    return np.array([spatial.distance.cosine(x,y)], dtype='f') 
    #spatial.distance.cosine is from SciPy, it outputs a double from 0 to 1

def cos_dist_output_shape(shapes):
    return (1,)

distance = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])

model = Model(input=[input_a, input_b], output=distance)

When I try to define distance = Lambda(...) I get:

Traceback (most recent call last): File "", line 1, in File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/engine/topology.py", line 485, in call self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/engine/topology.py", line 543, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/engine/topology.py", line 153, in create_node output_tensors = to_list(outbound_layer.call(input_tensors, mask=input_masks)) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/layers/core.py", line 446, in call return self.function(x, *arguments) File "", line 3, in cosine_distance File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/scipy/spatial/distance.py", line 329, in cosine dist = 1.0 - np.dot(u, v) / (norm(u) \ norm(v)) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/scipy/linalg/misc.py", line 166, in norm return np.linalg.norm(a, ord=ord) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 2116, in norm x = x.astype(float) ValueError: setting an array element with a sequence.

What is what I am doing wrong? Thank you very much for your help.

For Lambda Layer, you should call functions which work on Tensors, such as those in backend. spatial.distance.cosine is not what we can use. To calculate cosine distance, you can use Merge Layer with mode=cos. You can see how it works in source code, and write your own one if needed.

Thank you very much for your reply joelthchao but I have tried two things for solving it and this is what happens:

The first option, using merge:

output_cos = merge([processed_a, processed_b], mode='cos’, concat_axis=-1)

model = Model(input=[input_a, input_b], output=output_cos)

rms = RMSprop()
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
          validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y),
          batch_size=128,
          nb_epoch=nb_epoch)

And I get:

Traceback (most recent call last): File "", line 4, in File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/engine/training.py", line 1031, in fit self._make_train_function() File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/engine/training.py", line 658, in _make_train_function training_updates = self.optimizer.get_updates(trainable_weights, self.constraints, self.total_loss) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/optimizers.py", line 167, in get_updates grads = self.get_gradients(loss, params) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/optimizers.py", line 48, in get_gradients grads = K.gradients(loss, params) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 521, in gradients return T.grad(loss, variables) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 561, in grad grad_dict, wrt, cost_name) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1324, in _populate_grad_dict rval = [access_grad_cache(elem) for elem in wrt] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 973, in access_term_cache output_grads = [access_grad_cache(var) for var in node.outputs] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1279, in access_grad_cache term = access_term_cache(node)[idx] File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gradient.py", line 1113, in access_term_cache input_grads = node.op.grad(inputs, new_output_grads) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/tensor/elemwise.py", line 413, in grad return [DimShuffle(gz.type.broadcastable, grad_order)( File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/tensor/elemwise.py", line 165, in init (input_broadcastable, new_order)) ValueError: ('You cannot drop a non-broadcastable dimension.', ((False, False, False), (0, 2)))

The second option, defining a tensor capable function

def cosine_distance(vests):
    x, y = vests
    x = K.l2_normalize(x, axis=-1)
    y = K.l2_normalize(y, axis=-1)
    return -K.mean(x * y, axis=-1)

def cos_dist_output_shape(shapes):
    shape1, shape2 = shapes
    shape=list(shape1)
    assert len(shape) == 2
    shape[-1] = 1
    return tuple(shape)

output_cos = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])
model = Model(input=[input_a, input_b], output=output_cos)

rms = RMSprop()
model.compile(loss=contrastive_loss, optimizer=rms)

model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
          validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y),
          batch_size=128,
          nb_epoch=nb_epoch)

Train on 108400 samples, validate on 17820 samples Epoch 1/20 Traceback (most recent call last): File "", line 4, in File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/engine/training.py", line 1046, in fit callback_metrics=callback_metrics) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/engine/training.py", line 784, in _fit_loop outs = f(ins_batch) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 507, in call return self.function(inputs) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/home/rcandil/myPythonEnv/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in call outputs = self.fn() ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 128) Apply node that caused the error: Elemwise{Composite{((i0 * i1) + (i2 * i3))}}[(0, 2)](lambda_11_target, Elemwise{sqr,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{sqr,no_inplace}.0) Toposort index: 107 Inputs types: [TensorType(float32, matrix), TensorType(float32, row), TensorType(float32, matrix), TensorType(float32, row)] Inputs shapes: [(128, 1), (1, 128), (128, 1), (1, 128)] Inputs strides: [(4, 4), (512, 4), (4, 4), (512, 4)] Inputs values: ['not shown', 'not shown', 'not shown', 'not shown'] Outputs clients: [[Sum{acc_dtype=float64}(Elemwise{Composite{((i0 * i1) + (i2 \ i3))}}[(0, 2)].0)]]

Debugprint of the apply node: Elemwise{Composite{((i0 * i1) + (i2 * i3))}}[(0, 2)] [id A] <TensorType(float32, matrix)> ''
|lambda_11_target [id B] <TensorType(float32, matrix)> |Elemwise{sqr,no_inplace} [id C] <TensorType(float32, row)> ''
| |InplaceDimShuffle{x,0} [id D] <TensorType(float32, row)> ''
| |Elemwise{Composite{(-(i0 / i1))}}[(0, 0)] [id E] <TensorType(float32, vector)> ''
| |Sum{axis=[1], acc_dtype=float64} [id F] <TensorType(float32, vector)> ''
| | |Elemwise{Composite{((i0 * i1 * i2) / i3)}} [id G] <TensorType(float32, matrix)> ''
| | |TensorConstant{(1, 1) of 0.25} [id H] <TensorType(float32, (True, True))> | | |Elemwise{Composite{(Abs(i0) + i1 + i2)}}[(0, 1)] [id I] <TensorType(float32, matrix)> ''
| | | |Elemwise{add,no_inplace} [id J] <TensorType(float32, matrix)> ''
| | | | |Dot22 [id K] <TensorType(float32, matrix)> ''
| | | | | |Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)] [id L] <TensorType(float32, matrix)> ''
| | | | | | |InplaceDimShuffle{x,x} [id M] <TensorType(uint8, (True, True))> ''
| | | | | | | |keras_learning_phase [id N] <TensorType(uint8, scalar)> | | | | | | |TensorConstant{(1, 1) of 0.555556} [id O] <TensorType(float32, (True, True))> | | | | | | |Elemwise{add,no_inplace} [id P] <TensorType(float32, matrix)> ''
| | | | | | | |Dot22 [id Q] <TensorType(float32, matrix)> ''
| | | | | | | | |Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)] [id R] <TensorType(float32, matrix)> ''
| | | | | | | | | |InplaceDimShuffle{x,x} [id M] <TensorType(uint8, (True, True))> ''
| | | | | | | | | |TensorConstant{(1, 1) of 0.555556} [id O] <TensorType(float32, (True, True))> | | | | | | | | | |Elemwise{add,no_inplace} [id S] <TensorType(float32, matrix)> ''
| | | | | | | | | | |Dot22 [id T] <TensorType(float32, matrix)> ''
| | | | | | | | | | | |input_1 [id U] <TensorType(float32, matrix)> | | | | | | | | | | | |dense_1_W [id V] <TensorType(float32, matrix)> | | | | | | | | | | |InplaceDimShuffle{x,0} [id W] <TensorType(float32, row)> ''
| | | | | | | | | | |dense_1_b [id X] <TensorType(float32, vector)> | | | | | | | | | |Dot22 [id T] <TensorType(float32, matrix)> ''
| | | | | | | | | |InplaceDimShuffle{x,0} [id W] <TensorType(float32, row)> ''
| | | | | | | | | |Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)] [id Y] <TensorType(float32, matrix)> ''
| | | | | | | | | | |mrg_uniform{TensorType(float32, matrix),inplace}.1 [id Z] <TensorType(float32, matrix)> ''
| | | | | | | | | | | |<TensorType(int32, matrix)> [id BA] <TensorType(int32, matrix)> | | | | | | | | | | | |MakeVector{dtype='int64'} [id BB] <TensorType(int64, vector)> ''
| | | | | | | | | | | |Shape_i{0} [id BC] <TensorType(int64, scalar)> ''
| | | | | | | | | | | | |input_1 [id U] <TensorType(float32, matrix)> | | | | | | | | | | | |Shape_i{1} [id BD] <TensorType(int64, scalar)> ''
| | | | | | | | | | | |dense_1_W [id V] <TensorType(float32, matrix)> | | | | | | | | | | |TensorConstant{(1, 1) of 0.9} [id BE] <TensorType(float32, (True, True))> | | | | | | | | | |TensorConstant{(1, 1) of 0.5} [id BF] <TensorType(float32, (True, True))> | | | | | | | | |dense_2_W [id BG] <TensorType(float32, matrix)> | | | | | | | |InplaceDimShuffle{x,0} [id BH] <TensorType(float32, row)> ''
| | | | | | | |dense_2_b [id BI] <TensorType(float32, vector)> | | | | | | |Dot22 [id Q] <TensorType(float32, matrix)> ''
| | | | | | |InplaceDimShuffle{x,0} [id BH] <TensorType(float32, row)> ''
| | | | | | |Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)] [id BJ] <TensorType(float32, matrix)> ''
| | | | | | | |mrg_uniform{TensorType(float32, matrix),inplace}.1 [id BK] <TensorType(float32, matrix)> ''
| | | | | | | | |<TensorType(int32, matrix)> [id BL] <TensorType(int32, matrix)> | | | | | | | | |MakeVector{dtype='int64'} [id BM] <TensorType(int64, vector)> ''
| | | | | | | | |Shape_i{0} [id BC] <TensorType(int64, scalar)> ''
| | | | | | | | |Shape_i{1} [id BN] <TensorType(int64, scalar)> ''
| | | | | | | | |dense_2_W [id BG] <TensorType(float32, matrix)> | | | | | | | |TensorConstant{(1, 1) of 0.9} [id BE] <TensorType(float32, (True, True))> | | | | | | |TensorConstant{(1, 1) of 0.5} [id BF] <TensorType(float32, (True, True))> | | | | | |dense_3_W [id BO] <TensorType(float32, matrix)> | | | | |InplaceDimShuffle{x,0} [id BP] <TensorType(float32, row)> ''
| | | | |dense_3_b [id BQ] <TensorType(float32, vector)> | | | |Dot22 [id K] <TensorType(float32, matrix)> ''
| | | |InplaceDimShuffle{x,0} [id BP] <TensorType(float32, row)> ''
| | |Elemwise{Composite{(Abs(i0) + i1 + i2)}}[(0, 1)] [id BR] <TensorType(float32, matrix)> ''
| | | |Elemwise{add,no_inplace} [id BS] <TensorType(float32, matrix)> ''
| | | | |Dot22 [id BT] <TensorType(float32, matrix)> ''
| | | | | |Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)] [id BU] <TensorType(float32, matrix)> ''
| | | | | | |InplaceDimShuffle{x,x} [id M] <TensorType(uint8, (True, True))> ''
| | | | | | |TensorConstant{(1, 1) of 0.555556} [id O] <TensorType(float32, (True, True))> | | | | | | |Elemwise{add,no_inplace} [id BV] <TensorType(float32, matrix)> ''
| | | | | | | |Dot22 [id BW] <TensorType(float32, matrix)> ''
| | | | | | | | |Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)] [id BX] <TensorType(float32, matrix)> ''
| | | | | | | | | |InplaceDimShuffle{x,x} [id M] <TensorType(uint8, (True, True))> ''
| | | | | | | | | |TensorConstant{(1, 1) of 0.555556} [id O] <TensorType(float32, (True, True))> | | | | | | | | | |Elemwise{add,no_inplace} [id BY] <TensorType(float32, matrix)> ''
| | | | | | | | | | |Dot22 [id BZ] <TensorType(float32, matrix)> ''
| | | | | | | | | | | |input_2 [id CA] <TensorType(float32, matrix)> | | | | | | | | | | | |dense_1_W [id V] <TensorType(float32, matrix)> | | | | | | | | | | |InplaceDimShuffle{x,0} [id W] <TensorType(float32, row)> ''
| | | | | | | | | |Dot22 [id BZ] <TensorType(float32, matrix)> ''
| | | | | | | | | |InplaceDimShuffle{x,0} [id W] <TensorType(float32, row)> ''
| | | | | | | | | |Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)] [id CB] <TensorType(float32, matrix)> ''
| | | | | | | | | | |mrg_uniform{TensorType(float32, matrix),inplace}.1 [id CC] <TensorType(float32, matrix)> ''
| | | | | | | | | | | |<TensorType(int32, matrix)> [id CD] <TensorType(int32, matrix)> | | | | | | | | | | | |MakeVector{dtype='int64'} [id CE] <TensorType(int64, vector)> ''
| | | | | | | | | | | |Shape_i{0} [id CF] <TensorType(int64, scalar)> ''
| | | | | | | | | | | | |input_2 [id CA] <TensorType(float32, matrix)> | | | | | | | | | | | |Shape_i{1} [id BD] <TensorType(int64, scalar)> ''
| | | | | | | | | | |TensorConstant{(1, 1) of 0.9} [id BE] <TensorType(float32, (True, True))> | | | | | | | | | |TensorConstant{(1, 1) of 0.5} [id BF] <TensorType(float32, (True, True))> | | | | | | | | |dense_2_W [id BG] <TensorType(float32, matrix)> | | | | | | | |InplaceDimShuffle{x,0} [id BH] <TensorType(float32, row)> ''
| | | | | | |Dot22 [id BW] <TensorType(float32, matrix)> ''
| | | | | | |InplaceDimShuffle{x,0} [id BH] <TensorType(float32, row)> ''
| | | | | | |Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)] [id CG] <TensorType(float32, matrix)> ''
| | | | | | | |mrg_uniform{TensorType(float32, matrix),inplace}.1 [id CH] <TensorType(float32, matrix)> ''
| | | | | | | | |<TensorType(int32, matrix)> [id CI] <TensorType(int32, matrix)> | | | | | | | | |MakeVector{dtype='int64'} [id CJ] <TensorType(int64, vector)> ''
| | | | | | | | |Shape_i{0} [id CF] <TensorType(int64, scalar)> ''
| | | | | | | | |Shape_i{1} [id BN] <TensorType(int64, scalar)> ''
| | | | | | | |TensorConstant{(1, 1) of 0.9} [id BE] <TensorType(float32, (True, True))> | | | | | | |TensorConstant{(1, 1) of 0.5} [id BF] <TensorType(float32, (True, True))> | | | | | |dense_3_W [id BO] <TensorType(float32, matrix)> | | | | |InplaceDimShuffle{x,0} [id BP] <TensorType(float32, row)> ''
| | | |Dot22 [id BT] <TensorType(float32, matrix)> ''
| | | |InplaceDimShuffle{x,0} [id BP] <TensorType(float32, row)> ''
| | |Elemwise{mul,no_inplace} [id CK] <TensorType(float32, col)> ''
| | |Elemwise{Sqrt}[(0, 0)] [id CL] <TensorType(float32, col)> ''
| | | |InplaceDimShuffle{0,x} [id CM] <TensorType(float32, col)> ''
| | | |Sum{axis=[1], acc_dtype=float64} [id CN] <TensorType(float32, vector)> ''
| | | |Elemwise{Composite{sqr((i0 * i1))}} [id CO] <TensorType(float32, matrix)> ''
| | | |TensorConstant{(1, 1) of 0.5} [id BF] <TensorType(float32, (True, True))> | | | |Elemwise{Composite{(Abs(i0) + i1 + i2)}}[(0, 1)] [id I] <TensorType(float32, matrix)> ''
| | |Elemwise{Sqrt}[(0, 0)] [id CP] <TensorType(float32, col)> ''
| | |InplaceDimShuffle{0,x} [id CQ] <TensorType(float32, col)> ''
| | |Sum{axis=[1], acc_dtype=float64} [id CR] <TensorType(float32, vector)> ''
| | |Elemwise{Composite{sqr((i0 * i1))}} [id CS] <TensorType(float32, matrix)> ''
| | |TensorConstant{(1, 1) of 0.5} [id BF] <TensorType(float32, (True, True))> | | |Elemwise{Composite{(Abs(i0) + i1 + i2)}}[(0, 1)] [id BR] <TensorType(float32, matrix)> ''
| |InplaceDimShuffle{x} [id CT] <TensorType(float32, (True,))> ''
| |Elemwise{Cast{float32}} [id CU] <TensorType(float32, scalar)> ''
| |Shape_i{1} [id CV] <TensorType(int64, scalar)> ''
| |dense_3_W [id BO] <TensorType(float32, matrix)> |Elemwise{sub,no_inplace} [id CW] <TensorType(float32, matrix)> ''
| |TensorConstant{(1, 1) of 1.0} [id CX] <TensorType(float32, (True, True))> | |lambda_11_target [id B] <TensorType(float32, matrix)> |Elemwise{sqr,no_inplace} [id CY] <TensorType(float32, row)> ''
|InplaceDimShuffle{x,0} [id CZ] <TensorType(float32, row)> ''
|Elemwise{maximum,no_inplace} [id DA] <TensorType(float32, vector)> ''
|Elemwise{sub,no_inplace} [id DB] <TensorType(float32, vector)> ''
| |TensorConstant{(1,) of 1.0} [id DC] <TensorType(float32, (True,))> | |Elemwise{Composite{(-(i0 / i1))}}[(0, 0)] [id E] <TensorType(float32, vector)> ''
|TensorConstant{(1,) of 0.0} [id DD] <TensorType(float32, (True,))>

Storage map footprint:

dense_1_W, Shared Input, Shape: (784, 128), ElemSize: 4 Byte(s), TotalSize: 401408 Byte(s)
<TensorType(float32, matrix)>, Shared Input, Shape: (784, 128), ElemSize: 4 Byte(s), TotalSize: 401408 Byte(s)
input_2, Input, Shape: (128, 784), ElemSize: 4 Byte(s), TotalSize: 401408 Byte(s)
input_1, Input, Shape: (128, 784), ElemSize: 4 Byte(s), TotalSize: 401408 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.0, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.0, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.0, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
<TensorType(int32, matrix)>, Shared Input, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
<TensorType(int32, matrix)>, Shared Input, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
<TensorType(int32, matrix)>, Shared Input, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
<TensorType(int32, matrix)>, Shared Input, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.0, Shape: (15360, 6), ElemSize: 4 Byte(s), TotalSize: 368640 Byte(s)
Elemwise{Composite{(Abs(i0) + i1 + i2)}}[(0, 1)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{add,no_inplace}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{(Abs(i0) + i1 + i2)}}[(0, 1)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Dot22.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.1, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Dot22.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{add,no_inplace}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Dot22.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.1, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.1, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Dot22.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
mrg_uniform{TensorType(float32, matrix),inplace}.1, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Dot22.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Dot22.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
dense_2_W, Shared Input, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
dense_3_W, Shared Input, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
<TensorType(float32, matrix)>, Shared Input, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
<TensorType(float32, matrix)>, Shared Input, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{((i0 * i1 * i2) / i3)}}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{sqr((i0 * i1))}}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Switch(i0, (i1 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4) * i5), (i6 * Composite{(Abs(i0) + i1 + i2)}(i2, i3, i4)))}}[(0, 3)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{add,no_inplace}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{add,no_inplace}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{Cast{float32}(LT(i0, i1))}}[(0, 0)].0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{add,no_inplace}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{Composite{sqr((i0 * i1))}}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{add,no_inplace}.0, Shape: (128, 128), ElemSize: 4 Byte(s), TotalSize: 65536 Byte(s)
Elemwise{maximum,no_inplace}.0, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{sub,no_inplace}.0, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
InplaceDimShuffle{x,0}.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{Composite{(-(i0 / i1))}}[(0, 0)].0, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Sum{axis=[1], acc_dtype=float64}.0, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{Sqrt}[(0, 0)].0, Shape: (128, 1), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
InplaceDimShuffle{x,0}.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
InplaceDimShuffle{x,0}.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Sum{axis=[1], acc_dtype=float64}.0, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
dense_1_b, Shared Input, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
dense_2_b, Shared Input, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
dense_3_b, Shared Input, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
InplaceDimShuffle{x,0}.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
<TensorType(float32, vector)>, Shared Input, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{sub,no_inplace}.0, Shape: (128, 1), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
<TensorType(float32, vector)>, Shared Input, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
InplaceDimShuffle{0,x}.0, Shape: (128, 1), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{Sqrt}[(0, 0)].0, Shape: (128, 1), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{mul,no_inplace}.0, Shape: (128, 1), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{sqr,no_inplace}.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
InplaceDimShuffle{0,x}.0, Shape: (128, 1), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Sum{axis=[1], acc_dtype=float64}.0, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
<TensorType(float32, vector)>, Shared Input, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
Elemwise{sqr,no_inplace}.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
InplaceDimShuffle{x,0}.0, Shape: (1, 128), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
lambda_11_sample_weights, Input, Shape: (128,), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
lambda_11_target, Input, Shape: (128, 1), ElemSize: 4 Byte(s), TotalSize: 512 Byte(s)
MakeVector{dtype='int64'}.0, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
MakeVector{dtype='int64'}.0, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
MakeVector{dtype='int64'}.0, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
MakeVector{dtype='int64'}.0, Shape: (2,), ElemSize: 8 Byte(s), TotalSize: 16 Byte(s)
Shape_i{1}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
Constant{1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
Shape_i{0}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
Shape_i{1}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
Shape_i{1}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
Shape_i{0}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
TensorConstant{1.11111116409}, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
TensorConstant{(1, 1) of 2.0}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
Sum{axis=[0], acc_dtype=float64}.0, Shape: (1,), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of 0.9}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
Elemwise{Cast{float32}}.0, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
InplaceDimShuffle{x}.0, Shape: (1,), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of 1.0}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1,) of 1e-06}, Shape: (1,), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of 1e-06}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{1.0}, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
TensorConstant{(1, 1) of 0.555556}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of -2.0}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of -0.5}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of inf}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
<TensorType(float32, scalar)>, Shared Input, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
<TensorType(float32, scalar)>, Shared Input, Shape: (), ElemSize: 4 Byte(s), TotalSize: 4.0 Byte(s)
TensorConstant{(1, 1) of 0.0}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1,) of inf}, Shape: (1,), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1,) of 0.0}, Shape: (1,), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of 0.5}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1,) of 1.0}, Shape: (1,), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1, 1) of 0.25}, Shape: (1, 1), ElemSize: 4 Byte(s), TotalSize: 4 Byte(s)
TensorConstant{(1,) of 0}, Shape: (1,), ElemSize: 1 Byte(s), TotalSize: 1 Byte(s)
InplaceDimShuffle{x,x}.0, Shape: (1, 1), ElemSize: 1 Byte(s), TotalSize: 1 Byte(s)
keras_learning_phase, Input, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TensorConstant{(1, 1) of 0}, Shape: (1, 1), ElemSize: 1 Byte(s), TotalSize: 1 Byte(s) TotalSize: 4596431.0 Byte(s) 0.004 GB TotalSize inputs: 3346527.0 Byte(s) 0.003 GB

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

I have the same issue with the merge layer, referenced here https://github.com/fchollet/keras/issues/2671

I found a way to solve the problem with the second idea, here is the code:


def cosine_distance(vests):
    x, y = vests
    x = K.l2_normalize(x, axis=-1)
    y = K.l2_normalize(y, axis=-1)
    return -K.mean(x * y, axis=-1, keepdims=True)

def cos_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0],1)

def create_base_network(input_dim):
    '''Base network to be shared (eq. to feature extraction).
    '''
    seq = ...
    return seq

base_network = create_base_network(input_dim)

input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))

processed_a = base_network(input_a)
processed_b = base_network(input_b)

output_cos = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])

model = Model(input=[input_a, input_b], output=output_cos)

# train
rms = RMSprop()
model.compile(loss='binary_crossentropy', optimizer=rms)
model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
          validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y),
          batch_size=64,
          nb_epoch=nb_epoch)

I couldn't figure out how to solve it with the merge layer but I hope this helps to others with same issues.

The most important part is understanding how batch_dot works in the source code, so you have to add extra dim for the inputs and reshape back after doing so.

input_a = Input(shape=(input_dim, 1))
input_b = Input(shape=(input_dim, 1))

cos_distance = merge([input_a, input_b], mode='cos', dot_axes=1) # magic dot_axes works here!
cos_distance = Reshape((1,))(cos_distance)
cos_similarity = Lambda(lambda x: 1-x)(cos_distance)

model = Model([input_a, input_b], [cos_similarity])

@RubenZazo hi, I tried running your code (last version), but I have some problems, does it work for you? Because with my data it just stops learning after the first epoch and sometimes, after a few epochs, it even starts giving NaN scores.

@FrenkT Yes, I do not know why that is happening, I got working the version with merge using last joelthchao advise. I also do not know why the first one is not learning, I tried many things but it never learnt... Please let me know if you fix it somehow, at this moment I am using the merge layer but I would like to know why it is not working... Thanks!

@RubenZazo I don't know if it is due my type of data, but I also have problems with joelthchao code, it runs, but i have NaN scores. I can't tell if it is due to some kind of numerical error. At the moment I'm working with joelthchao code, but with merge mode='mul' (of course, with 'mul' i don't have all the "reshape steps")

@FrenkT cos mode does not carefully deal with division by zero, might be the cause of NaN. output = K.batch_dot(l1, l2, self.dot_axes) / denominator

Similar to my comment on #2799, I'm hoping to get a complete working example with Merge "cos" for the problem described in this issue.

Here is my attempt:

from keras.layers import Input, Embedding, Merge, Flatten, recurrent, Dropout, RepeatVector, Dense, core, Reshape, Lambda, merge, Convolution2D, MaxPooling2D, Activation
from keras.models import Model
from keras.models import Sequential
import itertools
import numpy as np

input_dim = 2
input_a = Input(shape=(input_dim, 1))
input_b = Input(shape=(input_dim, 1))

cos_distance = merge([input_a, input_b], mode='cos', dot_axes=1) # magic dot_axes works here!
cos_distance = Reshape((1,))(cos_distance)
cos_similarity = Lambda(lambda x: 1-x)(cos_distance)

model = Model([input_a, input_b], [cos_similarity])

model.summary()

model.compile(optimizer='rmsprop', loss='cosine_proximity', metrics=['accuracy'])

# fit the model to predict what color each person is
a_array = np.asarray(np.random.rand(2, 2, 1))
b_array = np.asarray(np.random.rand(2, 2, 1))
y_array = np.asarray(np.random.rand(2))
model.fit([a_array, b_array], [y_array], nb_epoch=100, verbose=1)

This gives me a loss of -1.00 (on the first epoch and all subsequent epochs) and an accuracy of 0. Any thoughts on what I'm doing wrong and/or a complete example which will work?

If it helps, I'm using Keras version 1.0.5 (the tarball available here: https://github.com/fchollet/keras/releases) and Theano 0.8.1. And I'm using GPU.

@jtaverni There are no weights in your model. Hence the constant accuracy / loss.

Thank you for the fast response; right you are.

Here's as revised version, adapting the pattern from #2799:

from keras.layers import Input, Embedding, Merge, Flatten, recurrent, Dropout, RepeatVector, Dense, core, Reshape, Lambda, merge, Convolution2D, MaxPooling2D, Activation
from keras.models import Model
from keras.models import Sequential
import itertools
import numpy as np

samples = 100
n_words = 3
n_embed_dims = 2
maxlen = 5

left = Sequential()
left.add(Embedding(input_dim=n_words, output_dim=n_embed_dims, input_length=maxlen))

right = Sequential()
right.add(Embedding(input_dim=n_words, output_dim=n_embed_dims, input_length=maxlen))

model = Sequential()
merged = Merge([left, right], mode='cos', dot_axes=1) # dot_axes
model.add(merged)
model.add(Activation('relu')) # add this line won't give me any error

model.summary()

model.compile(optimizer='rmsprop', loss='cosine_proximity', metrics=['accuracy'])

# fit the model to predict what color each person is
a_array = np.asarray(np.random.randint(n_words, size=(samples, maxlen)))
b_array = np.asarray(np.random.randint(n_words, size=(samples, maxlen)))
y_array = np.asarray(np.random.rand(samples))
model.fit([a_array, b_array], [y_array], nb_epoch=100, verbose=1)

This results in a similar error that I report on in #2799:

2016/07/15 19:47:35 Platform overridden to 'RHEL5_64'
Using Theano backend.
Using gpu device 0: GRID K520 (CNMeM is disabled, CuDNN not available)
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
embedding_1 (Embedding)          (None, 5, 2)          6           embedding_input_1[0][0]          
____________________________________________________________________________________________________
embedding_2 (Embedding)          (None, 5, 2)          6           embedding_input_2[0][0]          
____________________________________________________________________________________________________
activation_1 (Activation)        (None, 1)             0           merge_1[0][0]                    
====================================================================================================
Total params: 12
____________________________________________________________________________________________________
Epoch 1/100
Traceback (most recent call last):
  File "runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "runpy.py", line 72, in _run_code
    exec code in run_globals
  File "shell.py", line 42, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "../mm_em_2799_simple_w_embeddings.py", line 32, in <module>
    model.fit([a_array, b_array], [y_array], nb_epoch=100, verbose=1)
  File "keras/models.py", line 413, in fit
    sample_weight=sample_weight)
  File "keras/engine/training.py", line 1082, in fit
    callback_metrics=callback_metrics)
  File "keras/engine/training.py", line 801, in _fit_loop
    outs = f(ins_batch)
  File "keras/backend/theano_backend.py", line 531, in __call__
    return self.function(*inputs)
  File "theano/compile/function_module.py", line 871, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "theano/gof/link.py", line 314, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "theano/compile/function_module.py", line 859, in __call__
    outputs = self.fn()
ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[2] == 2, but the output's size on that axis is 32.
Apply node that caused the error: GpuElemwise{mul,no_inplace}(GpuElemwise{Sqrt}[(0, 0)].0, GpuElemwise{Sqrt}[(0, 0)].0)
Toposort index: 159
Inputs types: [CudaNdarrayType(float32, (True, True, False, True)), CudaNdarrayType(float32, (False, True, False, True))]
Inputs shapes: [(1, 1, 32, 1), (32, 1, 2, 1)]
Inputs strides: [(0, 0, 1, 0), (2, 0, 1, 0)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuElemwise{Composite{((i0 * i1 * i2) / i3)},no_inplace}(CudaNdarrayConstant{[[[[ 0.5]]]]}, GpuDimShuffle{x,x,0,1}.0, GpuElemwise{Composite{(i0 + Abs(i0))},no_inplace}.0, GpuElemwise{mul,no_inplace}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Any thoughts on what I'm doing wrong and/or a complete example which will work would be greatly appreciated.

Output shape of activation looks weird. If I remember correctly, I fixed it a couple of weeks ago. Are you sure you are running the latest version of keras ?

No other features that I have added to Keras has created as many issues as dot/cos merge. Really need some examples. Doc strings are not enough.

Agreed (on some functioning/complete examples). That's why I restarted this thread, to try to get at least one complete example.

As for the version of code I'm using, it is the 1.0.5 tarball available here: https://github.com/fchollet/keras/releases. Can you point me to your change so that I can verify that I'm using it? Also, if you can give a complete functioning example where Merge/cos is used, I would be grateful.

That release is 18 days old, the bug was fixed 14 days ago(pr: #3116 commit: 8d3f398). Its always safe to git clone and install latest version, do not care for releases.

An example would be babi_memnn.py

Let's add a couple examples in the docstring, it still to confuse a lot of people...

One major problem is that the output shape of batch_dot op is not clearly defined anywhere in Keras. In the backend it just states ndim >= 2. In the Merge class, the logic for finding output_shape is not clear, since a very weird way of simulating dot op using numpy is used. All this demands a lot of cognitive input from the user. So I would suggest:

Explain the logic for inferring output shape of batch_dot op in backend files.
Find the output shape for dot merge in Merge class explicitly.

Btw, this is the function for finding output shape of dot/cos merge.

def dot_output_shape(shape1, shape2, dot_axes):
    shape1 = list(shape1)
    shape2 = list(shape2)
    if type(dot_axes) == int:
        dot_axes = (dot_axes, ) * 2
    assert dot_axes[0] > 0 and dot_axes[1] > 0, 'Invalid dot_axes argument.'
    dim1 = shape1[dot_axes[0]]
    dim2 = shape2[dot_axes[1]]
    assert dim1 is None or dim2 is None or dim1 == dim2, 'Incompatible shapes'
    shape1.pop(dot_axes[0])
    shape2.pop(dot_axes[1])
    shape2.pop(0)
    return tuple(shape1 + shape2)

Hi @joelthchao I tried this. but it does not work.

input_a = Input(shape=(input_dim, 1))
input_b = Input(shape=(input_dim, 1))

cos_distance = merge([input_a, input_b], mode='cos', dot_axes=1) # magic dot_axes works here!
cos_distance = Reshape((1,))(cos_distance)
cos_similarity = Lambda(lambda x: 1-x)(cos_distance)

model = Model([input_a, input_b], [cos_similarity])

My input is 1000 long vector in both the sides OUTPUT:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 512)               512512    
_________________________________________________________________
dense_2 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_3 (Dense)              (None, 300)               77100     
=================================================================
Total params: 720,940
Trainable params: 720,940
Non-trainable params: 0
_________________________________________________________________
None
Traceback (most recent call last):
  File "/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 671, in _call_cpp_shape_fn_impl
    input_tensors_as_shapes, status)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 1 and 1000 for 'sequential_1/dense_1/MatMul' (op: 'MatMul') with input shapes: [?,1], [1000,512].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "siamese_mlp.py", line 101, in <module>
    processed_a = base_network(input_a)
  File "/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/keras/engine/topology.py", line 596, in __call__
    output = self.call(inputs, **kwargs)
  File "/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/keras/models.py", line 533, in call
    return self.model.call(inputs, mask)
    File "/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/keras/layers/core.py", line 843, in call
    output = K.dot(inputs, self.kernel)
...
...
  File "/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 676, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 1 and 1000 for 'sequential_1/dense_1/MatMul' (op: 'MatMul') with input shapes: [?,1], [1000,512].

I resolved this error by removing 1 in

input_a = Input(shape=(input_dim, 1))
input_b = Input(shape=(input_dim, 1))

Now i get the following output: The loss does not change

Train on 263000 samples, validate on 113000 samples
Epoch 1/25
263000/263000 [==============================] - 7s - loss: 0.4236 - val_loss: 0.4105
Epoch 2/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 3/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 4/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 5/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 6/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 7/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 8/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 9/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 10/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 11/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 12/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 13/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 14/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 15/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 16/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 17/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 18/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 19/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 20/25
263000/263000 [==============================] - 7s - loss: 0.4236 - val_loss: 0.4105
Epoch 21/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 22/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 23/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 24/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
Epoch 25/25
263000/263000 [==============================] - 6s - loss: 0.4236 - val_loss: 0.4105
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 1000)          0                                            
____________________________________________________________________________________________________
input_2 (InputLayer)             (None, 1000)          0                                            
____________________________________________________________________________________________________
merge_1 (Merge)                  (None, 1)             0           input_1[0][0]                    
                                                                   input_2[0][0]                    
____________________________________________________________________________________________________
reshape_1 (Reshape)              (None, 1)             0           merge_1[0][0]                    
____________________________________________________________________________________________________
lambda_2 (Lambda)                (None, 1)             0           reshape_1[0][0]                  
====================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
____________________________________________________________________________________________________
None
* Accuracy on training set: 3.70%
* Accuracy on test set: 5.60%

@vijaydwivedi75 Trainable params: 0, your network has no weights to train, therefore, loss doesn't change.

I guess I get it. I merge, in place of input_a and input_b, it requires processed_a and processed_b

@joelthchao I get the following results. Still not solved. I used this for loss.

def cos_distance(y_true, y_pred):
    def l2_normalize(x, axis):
        norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
        return K.maximum(x, K.epsilon()) / K.maximum(norm, K.epsilon())
    y_true = l2_normalize(y_true, axis=-1)
    y_pred = l2_normalize(y_pred, axis=-1)
    return -K.mean(y_true * y_pred, axis=-1)

When I use cosine_proximity in the loss, i get nan as losses in all epochs.

Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 512)               512512    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 384)               196992    
_________________________________________________________________
dropout_2 (Dropout)          (None, 384)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               98560     
_________________________________________________________________
dropout_3 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 300)               77100     
=================================================================
Total params: 885,164
Trainable params: 885,164
Non-trainable params: 0
_________________________________________________________________
None
siamese_mlp.py:122: UserWarning: The `merge` function is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
  cos_distance = merge([processed_a, processed_b], mode='cos', dot_axes=1) # magic dot_axes works here!
/DATA2/USERS/vijay/.local/lib/python3.5/site-packages/keras/legacy/layers.py:460: UserWarning: The `Merge` layer is deprecated and will be removed after 08/2017. Use instead layers from `keras.layers.merge`, e.g. `add`, `concatenate`, etc.
  name=name)
Train on 26300 samples, validate on 11300 samples
Epoch 1/20
26300/26300 [==============================] - 5s - loss: -0.0049 - val_loss: 0.0000e+00
Epoch 2/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 3/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 4/20
26300/26300 [==============================] - 4s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 5/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 6/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 7/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 8/20
26300/26300 [==============================] - 4s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 9/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 10/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 11/20
26300/26300 [==============================] - 4s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 12/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 13/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 14/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 15/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 16/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 17/20
26300/26300 [==============================] - 2s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 18/20
26300/26300 [==============================] - 2s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 19/20
26300/26300 [==============================] - 3s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 20/20
26300/26300 [==============================] - 4s - loss: 0.0000e+00 - val_loss: 0.0000e+00
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 1000)          0                                            
____________________________________________________________________________________________________
input_2 (InputLayer)             (None, 1000)          0                                            
____________________________________________________________________________________________________
sequential_1 (Sequential)        (None, 300)           885164      input_1[0][0]                    
                                                                   input_2[0][0]                    
____________________________________________________________________________________________________
merge_1 (Merge)                  (None, 1)             0           sequential_1[1][0]               
                                                                   sequential_1[2][0]               
____________________________________________________________________________________________________
reshape_1 (Reshape)              (None, 1)             0           merge_1[0][0]                    
____________________________________________________________________________________________________
lambda_2 (Lambda)                (None, 1)             0           reshape_1[0][0]                  
====================================================================================================
Total params: 885,164
Trainable params: 885,164
Non-trainable params: 0
____________________________________________________________________________________________________
None
* Accuracy on training set: nan%
* Accuracy on test set: nan%

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

keras-team / keras

Trying to define Cosine Similarity Layer Issues #2672