Closed kavehtp closed 8 years ago
Do you even need a mask past the LSTM? You are squashing over time as it is.
Which, by the way, you aren't actually using your mask to do the mean, so the zero-padded items aren't being ignored.
My suggestion is to use a custom layer to implement MeanOverTime
. I have this one which may work: https://github.com/braingineer/ikelos/blob/master/ikelos/layers/utility.py#L29
Basically, it lets you pass in a function to modify the mask rather than the layer. It's quick and dirty and a bit hacky, but it works in a pinch.
However, in thinking about your problem, you can't just use this. I would take the LambdaMask layer, modify it so that it in call
it outputs
if mask is not None:
if K.ndim(mask) == K.ndim(x) - 1:
mask = K.expand_dims(mask)
x *= mask
return x
this way, your mask is being given a broadcastable dimension so it will be able to multiply across your feature dimension. it is also correctly 0ing out the time values you don't care about.
in the compute mask portion, you will want to just return None. this gets rid of the mask in the pipeline. Now, your Dense can properly do its job without having to worry.
Yeah, I don't need the mask after MeanOverTime layer. I just did not know how to remove mask after MeanOverTime. I implemented this really really ugly layer instead (it works though):
def MeanOverTime():
mean_func = lambda x: K.cast((x.sum(axis=1) / (x.shape[1] - K.equal(x, 0).all(axis=2).sum(axis=1, keepdims=True))), K.floatx())
layer = Lambda(mean_func, output_shape=lambda s: (s[0], s[2]))
layer.supports_masking = True
def compute_mask(input, mask):
return None
layer.compute_mask = compute_mask
return layer
I also did not know how to access the mask inside this function. That's why I am using K.equal(...).all(...)
hack. I am gonna fix it now. Thanks!
You could use the following function for mean over time.
def lambda_mask_average(x,mask=None):
return K.batch_dot(x,mask,axes=1) / K.sum(mask, axis=-1, keepdims=True)
main_input = Input(shape=(input_length,),dtype='int32')
m = Embedding(vocab_size+1, emb_size, input_length=input_length, mask_zero=True)(main_input)
m = LSTM(lstm_dim, return_sequences=True)(m)
m = MaskEatingLambda(lambda_mask_average, output_shape=(lstm_dim,))(m)
# no more mask layer.
# insert whatever other layers you want here
model = Model(input=main_input, output=m)
Thanks @mpavankumarreddy. The problem is solved. I am closing the issue.
Hey @kavehtp,
I am using the same architecture from @sergeyf for a toy problem. I added a dense layer after the averaging layer as such
def lambda_mask_average(x,mask=None):
return K.batch_dot(x,mask,axes=1) / K.sum(mask, axis=-1, keepdims=True)
def lambda_mask_sum(x,mask=None):
return K.batch_dot(x,mask,axes=1)
main_input = Input(shape=(maxlen,), dtype='int32')
x = Embedding(max_features+1, embed_dim, input_length=maxlen, dropout=0.2, mask_zero=True)(main_input)
x = LSTM(lstm_dim, dropout_W=0.2, dropout_U=0.2, return_sequences=True)(x)
x = MaskEatingLambda(lambda_mask_average,output_shape=(lstm_dim,))(x)
pred = Dense(1,activation='sigmoid')(x)
However, Dense layer gives an error because the input type is upcasted by MaskEatingLambda layer to float64 and it expects float32. On the other hand, that problem does not occur if I use lambda_mask_sum function.
Have you come across with a similar problem in your implementation? Can you suggest a fix for it?
Thanks
You'll probably have to use K.cast
on the sum over mask.
On Fri, Jun 17, 2016 at 9:05 AM, Caglayan Dicle notifications@github.com wrote:
Hey @kavehtp https://github.com/kavehtp,
I am using the same architecture from @sergeyf https://github.com/sergeyf for a toy problem. I added a dense layer after the averaging layer as such
def lambda_mask_average(x,mask=None): return K.batch_dot(x,mask,axes=1) / K.sum(mask, axis=-1, keepdims=True)
def lambda_mask_sum(x,mask=None): return K.batch_dot(x,mask,axes=1)
main_input = Input(shape=(maxlen,), dtype='int32') x = Embedding(max_features+1, embed_dim, input_length=maxlen, dropout=0.2, mask_zero=True)(main_input) x = LSTM(lstm_dim, dropout_W=0.2, dropout_U=0.2, return_sequences=True)(x) x = MaskEatingLambda(lambda_mask_average,output_shape=(lstm_dim,))(x) pred = Dense(1,activation='sigmoid')(x)
However, Dense layer gives an error because the input type is upcasted by MaskEatingLambda layer to float64 and it expects float32. On the other hand, that problem does not occur if I use lambda_mask_sum function.
Have you come across with a similar problem in your implementation? Can you suggest a fix for it?
Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/2728#issuecomment-226810579, or mute the thread https://github.com/notifications/unsubscribe/ABya7HXMtWMXFcKBjuhzpICz9D6UqU3gks5qMsW6gaJpZM4IfBxy .
Hey @sergeyf. Thanks for quick reply.
I tried
return K.batch_dot(x,mask,axes=1) / K.cast_to_floatx(K.sum(mask, axis=-1, keepdims=True))
and got the error
ValueError: setting an array element with a sequence.
Now, I could not fix that one, mainly, due to the lack of my python knowledge, but I believe you guys can help me.
I use it like this: K.cast(x,'float32').
On Fri, Jun 17, 2016 at 10:48 AM, Caglayan Dicle notifications@github.com wrote:
Hey @sergeyf https://github.com/sergeyf. Thanks for quick reply.
I tried
return K.batch_dot(x,mask,axes=1) / K.cast_to_floatx(K.sum(mask, axis=-1, keepdims=True))
and got the error
ValueError: setting an array element with a sequence.
Now, I could not fix that one, mainly, due to the lack of my python knowledge, but I believe you guys can help me.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/2728#issuecomment-226835691, or mute the thread https://github.com/notifications/unsubscribe/ABya7HDTBlZP9BeJxW9iKNrYKsGby7Vjks5qMt38gaJpZM4IfBxy .
That solved the problem. Thanks man!
Has anyone checked if these work with the tensorflow backend? Trying the code by @mpavankumarreddy
I get this error:
/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.pyc in batch_dot(x, y, axes)
247 adj_x = None
248 adj_y = None
--> 249 out = tf.batch_matmul(x, y, adj_x=adj_x, adj_y=adj_y)
250 if ndim(out) == 1:
251 out = expand_dims(out, 1)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.pyc in _batch_mat_mul(x, y, adj_x, adj_y, name)
387 """
388 result = _op_def_lib.apply_op("BatchMatMul", x=x, y=y, adj_x=adj_x,
--> 389 adj_y=adj_y, name=name)
390 return result
391
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.pyc in apply_op(self, op_type_name, name, **keywords)
702 op = g.create_op(op_type_name, inputs, output_types, name=scope,
703 input_types=input_types, attrs=attr_protos,
--> 704 op_def=op_def)
705 outputs = op.outputs
706 return _Restructure(ops.convert_n_to_tensor(outputs),
/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.pyc in create_op(self, op_type, inputs, dtypes, input_types, name, attrs, op_def, compute_shapes, compute_device)
2260 original_op=self._default_original_op, op_def=op_def)
2261 if compute_shapes:
-> 2262 set_shapes_for_outputs(ret)
2263 self._add_op(ret)
2264 self._record_op_seen_by_control_dependencies(ret)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.pyc in set_shapes_for_outputs(op)
1700 raise RuntimeError("No shape function registered for standard op: %s"
1701 % op.type)
-> 1702 shapes = shape_func(op)
1703 if shapes is None:
1704 raise RuntimeError(
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.pyc in _BatchMatMulShape(op)
1383 if a_shape.dims is None and b_shape.dims is None:
1384 return [tensor_shape.unknown_shape()]
-> 1385 batch_dims = a_shape[:-2].merge_with(b_shape[:-2])
1386 output_rows = a_shape[-1] if adj_a else a_shape[-2]
1387 output_cols = b_shape[-2] if adj_b else b_shape[-1]
/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.pyc in merge_with(self, other)
568 except ValueError:
569 raise ValueError("Shapes %s and %s are not compatible" %
--> 570 (self, other))
571
572 def concatenate(self, other):
ValueError: Shapes (?,) and () are not compatible
Hi @braingineer , I'm new to keras and I want to process sentences with different number of words in CNN. I used zero-padding, but layers after Embedding layer doesn't support masking. Is there any way to solve it?
inputs = Input(shape=(1,max_len),dtype='int32')
x= Embedding(vocab_size, dim,weights = GloVe, input_length=max_len)(inputs)
x = Reshape((1,max_len,50))(x)
x = Convolution2D(nb_filter, n_gram, dim,init='glorot_uniform', activation='linear',border_mode='valid', subsample=(1,1))(x)
x = MaxPooling2D(pool_size=(2,1))(x1)
x = Flatten()(x)
out = Dense(10)(x)
Conv_sen= Model(inputs,out)
By the way, does keras support global pooling? Thanks.
For anyone who stumbles onto this post looking to deal with Embeddings, zeros, and masks, the following works in both Theano and TF.
My solution to this problem is as follows:
(1) Make a custom ZeroMaskedEntries
layer that (a) zeros out all of the masked-out embedding rows and (b) swallows the mask so it doesn't pass on.
(2) Use a lambda
function called mask_aware_mean
that knows to ignore all-zero rows when taking the mean.
This is a little bit silly (inefficient) because first I get rid of the mask, and then I reconstruct, but it gets rid of the whole MaskEatingLambda
business. You can also use ZeroMaskedEntries
in other places, and easily modify it to pass on the mask if need be.
Here is ZeroMaskedEntries
:
import keras.backend as K
from keras.engine.topology import Layer
class ZeroMaskedEntries(Layer):
"""
This layer is called after an Embedding layer.
It zeros out all of the masked-out embeddings.
It also swallows the mask without passing it on.
You can change this to default pass-on behavior as follows:
def compute_mask(self, x, mask=None):
if not self.mask_zero:
return None
else:
return K.not_equal(x, 0)
"""
def __init__(self, **kwargs):
self.support_mask = True
super(ZeroMaskedEntries, self).__init__(**kwargs)
def build(self, input_shape):
self.output_dim = input_shape[1]
self.repeat_dim = input_shape[2]
def call(self, x, mask=None):
mask = K.cast(mask, 'float32')
mask = K.repeat(mask, self.repeat_dim)
mask = K.permute_dimensions(mask, (0, 2, 1))
return x * mask
def compute_mask(self, input_shape, input_mask=None):
return None
Below is a way to take the mean of what comes out of ZeroMaskedEntries
. It does the silly business mentioned above of reconstructing the mask, but the computational hit is minor in my experience.
def mask_aware_mean(x):
# recreate the masks - all zero rows have been masked
mask = K.not_equal(K.sum(K.abs(x), axis=2, keepdims=True), 0)
# number of that rows are not all zeros
n = K.sum(K.cast(mask, 'float32'), axis=1, keepdims=False)
# compute mask-aware mean of x
x_mean = K.sum(x, axis=1, keepdims=False) / n
return x_mean
def mask_aware_mean_output_shape(input_shape):
shape = list(input_shape)
assert len(shape) == 3
return (shape[0], shape[2])
And here is a test to make sure it all works:
import numpy as np
from keras.layers import Input, Embedding, Lambda
from keras.models import Model
output_dim = 2
input_dim = 25
input_length = 4
main_input = Input(shape=(input_length,), dtype='int32')
embed = Embedding(output_dim=output_dim, input_dim=input_dim, input_length=input_length, mask_zero=True)(main_input)
embed_zeroed = ZeroMaskedEntries()(embed)
lambda_mean = Lambda(mask_aware_mean, mask_aware_mean_output_shape)(embed_zeroed)
model = Model(input=main_input,output=lambda_mean)
model.compile(optimizer='rmsprop',loss='mse')
# test
test_input = [[0,0,2,0],[0,0,0,1],[0,0,2,1]]
test_output = model.predict(test_input)
print('Mean is working?', np.all(np.isclose(test_output[0:2,:].mean(0),test_output[2,:])))
@sergeyf Thank you very much for your solution above!
I know my solution below is wrong, but can you explain to me why it is incorrect in achieving the average embedding vector?
def means(x):
return K.mean(x, axis=1)
model = Sequential()
model.add(Embedding(num_features+2, 128))
model.add(Lambda(means, output_shape=(128,)))
model.add(Masking(mask_value=0))
model.add(Dense(64, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
I'm not getting any errors like the OP but I assume it's still computing the average with the zeros, correct? Thanks!
I think because Dense doesn't support masking.
By chance I currently have the same problem: I want to average the outputs of a TimeDistributed Dense layer across all non-masked timesteps. Is @kavehtp's MeanOverTime
the best way to do it?
edit: Just saw @sergeyf's mask_aware_mean
.
@iskandr I haven't tried my approach after a TimeDistributed
anything, so not sure how it would work. If it does, please put up an example here for posterity!
Here is another (slightly cleaner?) alternative implementation:
class MeanOverTime(Layer):
def __init__(self, **kwargs):
self.supports_masking = True
super(MeanOverTime, self).__init__(**kwargs)
def call(self, x, mask=None):
return K.cast(x.sum(axis=1) / mask.sum(axis=1, keepdims=True), K.floatx())
def get_output_shape_for(self, input_shape):
return (input_shape[0], input_shape[2])
def compute_mask(self, x, mask):
return None
def get_config(self):
config = {}
base_config = super(MeanOverTime, self).get_config()
return dict(list(base_config.items()))
@sergeyf Just a follow-up question... I am observing that using your solution, my model always predicts the most frequent class for each and every test example. My goal is to average the embeddings of variable-length vectors (which I padded with zeros) and to predict one of n_classes
classes.
Here is my implementation using your solution:
model = Sequential()
model.add(Embedding(num_features+2, 128, mask_zero=True))
model.add(ZeroMaskedEntries())
model.add(Lambda(mask_aware_mean))
model.add(Dense(n_classes, activation='softmax'))
When I disable the mask_zero
flag and remove the ZeroMaskedEntries
layer, it seems to suddenly work (i.e., doesn't always predict the same class for every example) as follows:
model = Sequential()
model.add(Embedding(num_features+2, 128))
model.add(Lambda(mask_aware_mean))
model.add(Dense(n_classes, activation='softmax'))
Why could this phenomenon be happening and what may I be doing wrong? Thanks!
@Qululu I am not sure. The use case I had was the same as yours. I have a bunch of variable-length text, and I wanted to train a smart average of it, without getting messed up by the zero-index vector that comes out of the Embedding
layer. I found a model that works better for my use-case (sorry for the extra classes etc):
from keras.engine import Layer
class NamedLambda(Lambda):
def __init__(self, name=None):
Lambda.__init__(self, self.fn, name=name)
@classmethod
def invoke(cls, args, **kw):
return cls(**kw)(args)
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
class L2Normalize(NamedLambda):
def fn(self, x):
return K.l2_normalize(x, axis=-1)
class Sum(NamedLambda):
def fn(self, x):
return K.sum(x, axis=1)
embed_direction = Embedding(output_dim=output_dim,
input_dim=input_dim, mask_zero=True)
mask_direction = ZeroMaskedEntries()
embedding = mask_direction(embed_direction(main_input)
sum = Sum.invoke(embedding, name='the_sum')
l2_normed_sum = L2Normalize.invoke(sum, name='l2_sum')
Try that. Or some other ideas for how to debug your original code:
(1) Alter mask_aware_mean
to just take a dumb average ignoring the mask. This will confirm that it's not ZeroMaskedEntries
that's causing the problem.
(2) Alter ZeroMaskedEntries
to just return x
. This will confirm that it's not mask_aware_mean
that is causing the problem.
@sergeyf Ok, I tried both debug suggestions. While (2) confirmed mask_aware_mean
was indeed working, (1) led me to stumble upon an interesting phenomenon...
When I changed the mask_aware_mean
implementation to ignore the mask as such:
def mask_aware_mean(x):
n = K.sum(K.cast(x, 'float32'), axis=1, keepdims=False)
x_mean = K.sum(x, axis=1, keepdims=False) / n
return x_mean
then I observed the problem still occurring. However, if I used the following dumb averaging implementation:
def means(x)
return K.mean(x, axis=1)
then it worked.
So what is the difference between the two implementations? Aren't they both equivalent?
Also, if this implies that ZeroMaskedEntries
is causing the problem, then how can I debug it too? Thanks!
@Qululu This looks wrong: n = K.sum(K.cast(x, 'float32'), axis=1, keepdims=False)
n
should be the number of rows. Originally it was n = K.sum(K.cast(mask, 'float32'), axis=1, keepdims=False)
, which makes sense since mask
is binary. x
is not binary, so you're not going to get the number of rows from your operation.
@sergeyf Hi Sergey, oops you're right. Here, I corrected it to the best of my ability:
def mask_aware_mean(x):
# All values will meet the criterion >= 0
mask = K.greater_equal(K.sum(K.abs(x), axis=2, keepdims=True), 0)
n = K.sum(K.cast(mask, 'float32'), axis=1, keepdims=False)
x_mean = K.sum(x, axis=1, keepdims=False) / n
return x_mean
The above implementation computes the dumb average and indeed the phenomenon (model predicting most dominant class for every example) goes away. You suggested in (1) that this could mean ZeroMaskedEntries
is causing the problem. Now I'm stuck, any help would be appreciated.
Not sure this is relevant, but I'm trying to compute the average embeddings for variable-length and non-sequential sets of one-hot encoded words. However, I still need to treat these sets as lists and pad them with zeros to be of uniform size for feeding into the Embedding layer, right?
from keras.preprocessing import sequence
embedding_layer_input = sequence.pad_sequences(np.array(word_idxs), maxlen=MAX_WORD_IDX_LEN)
Is this the correct way to handle embeddings of variable-length non-sequential tokens?
@Qululu Yes, your padding looks right assuming that word_idxs
is something like [[1,2,3],[4,2,1],...]
I'm not sure what the problem is. I think you need to set up a test bed to carefully go through and figure out where things stop behaving the way you want with your particular dataset. It sounds like you haven't found any particular part of my code to be broken in any obvious way, so I don't know how to help you debug further.
Also, try my K.l2_normalize
solution.
Sorry I can't be of more help. Please report anything you find though - I'm sure others will find it useful.
Hi, i have i little problem with masking. There are training examples in my datasets for which no masking is applied because their length is equal (or slightly less) than the input_length in the Embedding
Layer. My problem is that using the following Layer:
class MeanOverTime(Layer):
def __init__(self, **kwargs):
self.supports_masking = True
super(MeanOverTime, self).__init__(**kwargs)
def call(self, x, mask=None):
if mask is not None:
return K.cast(x.sum(axis=1) / mask.sum(axis=1, keepdims=True), K.floatx())
else:
return K.mean(x, axis=1)
def get_output_shape_for(self, input_shape):
return input_shape[0], input_shape[-1]
def compute_mask(self, input, input_mask=None):
return None
there are case that this mask.sum(axis=1, keepdims=True)
leads to division by zero. When this happens loss becomes nan.
I don't know how to work with tensors and i need just to add this little check. In order to bypass this i have increased the input_length
so it covers all my training examples. Also i tried adding a try/except but this also didn't work.
This is what i did. Hope it helps someone...
class MeanOverTime(Layer):
def __init__(self, **kwargs):
self.supports_masking = True
super(MeanOverTime, self).__init__(**kwargs)
def call(self, x, mask=None):
if mask is not None:
mask = K.cast(mask, 'float32')
s = mask.sum(axis=1, keepdims=True)
if K.equal(s, K.zeros_like(s)):
return K.mean(x, axis=1)
else:
return K.cast(x.sum(axis=1) / mask.sum(axis=1, keepdims=True), K.floatx())
else:
return K.mean(x, axis=1)
def get_output_shape_for(self, input_shape):
return input_shape[0], input_shape[-1]
def compute_mask(self, input, input_mask=None):
return None
@cbaziotis thanks for your snippet. It seems to throw errors with my version of keras. Here is my updated version (I flatten the output vector):
class MeanOverTime(Layer):
def __init__(self, **kwargs):
self.supports_masking = True
super(MeanOverTime, self).__init__(**kwargs)
def call(self, x, mask=None):
if mask is not None:
mask = K.cast(mask, K.floatx())
mask_sum = K.sum(mask, axis=1, keepdims=True)
mask_sum = K.maximum(1.0, mask_sum)
return K.sum(x, axis=1, keepdims=False) / mask_sum
else:
return K.mean(x, axis=1, keepdims=False)
def compute_output_shape(self, input_shape):
return (input_shape[0], input_shape[-1])
def compute_mask(self, input, input_mask=None):
return None
Hi,
I am trying to implement a model over zero-padded sequences. The problem is when I use
mask_zero=True
some layers do not support it. For example, in the following code, the Dense layer throws an error that says it does not support masking:Is there an easy way to fix this? Thanks.
Kaveh