keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.48k forks source link

CNN layers don't work with mask_zero=True in embedding layer #9311

Closed siebeniris closed 3 years ago

siebeniris commented 6 years ago

I tried to use CNN layers to do sequence labeling. My model looks like this: model = Sequential() model.add(Embedding(input_dim= n_symbols, output_dim= embed_size,input_length=maxlen, trainable=True, weights=[embedding_weights])) model.add(Dropout(rate = 0.5)) model.add(Conv1D(filters=filters, kernel_size=5, strides=1, padding="same")) model.add(Dropout(rate = 0.5)) model.add(Conv1D(filters=filters, kernel_size=5, strides=1, padding="same")) model.add(Dropout(rate = 0.5)) model.add(Conv1D(filters=filters, kernel_size=5, strides=1, padding="same")) model.add(Dropout(rate = 0.5)) model.add(TimeDistributed(Dense(7, activation='softmax')))

sgd=SGD(lr=0.005, momentum=0.2) model.compile(optimizer = 'adadelta', loss='categorical_crossentropy', metrics=['accuracy'], sample_weight_mode='temporal') #fit model history= model.fit(Xtrain, ytrain, batch_size=batch_size, epochs=epochs, validation_data=(Xdev, ydev),verbose=1, sample_weight=weights)

So the train data and its labels are all padded into matrices with the same length in each sample. Without mask_zero in embedding layer, the model predicts on the padding, therefore, the model prediction is very high on recall , and very low on precision, because it predicts on all the padded spots, where should not be predicted. In order to let it pass the padded spot, I tried to make the weight on the padded spots all zero, but it then works very bad, the accuracy becomes less than 3 percent. Any suggestions to make the model work?

Thanks !

chiragjn commented 6 years ago

I have this issue as well. I also want to know if Embedding Layer assigns a zero vector 0s when mask_zero=True, that way convolutions wouldn't be affected

siebeniris commented 6 years ago

My tutor told me to do the padding with different value than zeros, (because one of my labels is 0), then computing the sample_weights. It worked, the model does not predict on padding anymore, because its weights are too low to be predicted due its massiveness. However, the model does not work very well after that, and gives out negative losses., which is wierd. Any suggestions to improve the model?

chiragjn commented 6 years ago

Since the Embedding Layer doesn't assign a zero vector to zero indices (which makes sense for some) I decided to calculate a mask on my 2D input and expanded and tiled it to match the 3D tensor after embedding and then multiplied both before passing onto conv1d

Hopefully, I am doing it correctly

inputs_embedded = Embedding(input_dim=...,  input_length=..., output_dim=embedding_size)(input)
mask = K.cast(K.not_equal(input, 0), 'float32')
expanded_mask = K.tile(K.expand_dims(mask, axis=2), n=[1, 1, embedding_size])
input_embedded_masked = Multiply()([expanded_mask,  inputs_embedded])
# pass onto conv1d