Closed siebeniris closed 3 years ago
I have this issue as well. I also want to know if Embedding Layer assigns a zero vector 0s when mask_zero=True, that way convolutions wouldn't be affected
My tutor told me to do the padding with different value than zeros, (because one of my labels is 0), then computing the sample_weights. It worked, the model does not predict on padding anymore, because its weights are too low to be predicted due its massiveness. However, the model does not work very well after that, and gives out negative losses., which is wierd. Any suggestions to improve the model?
Since the Embedding Layer doesn't assign a zero vector to zero indices (which makes sense for some) I decided to calculate a mask on my 2D input and expanded and tiled it to match the 3D tensor after embedding and then multiplied both before passing onto conv1d
Hopefully, I am doing it correctly
inputs_embedded = Embedding(input_dim=..., input_length=..., output_dim=embedding_size)(input)
mask = K.cast(K.not_equal(input, 0), 'float32')
expanded_mask = K.tile(K.expand_dims(mask, axis=2), n=[1, 1, embedding_size])
input_embedded_masked = Multiply()([expanded_mask, inputs_embedded])
# pass onto conv1d
I tried to use CNN layers to do sequence labeling. My model looks like this:
model = Sequential()
model.add(Embedding(input_dim= n_symbols, output_dim= embed_size,input_length=maxlen, trainable=True, weights=[embedding_weights]))
model.add(Dropout(rate = 0.5))
model.add(Conv1D(filters=filters, kernel_size=5, strides=1, padding="same"))
model.add(Dropout(rate = 0.5))
model.add(Conv1D(filters=filters, kernel_size=5, strides=1, padding="same"))
model.add(Dropout(rate = 0.5))
model.add(Conv1D(filters=filters, kernel_size=5, strides=1, padding="same"))
model.add(Dropout(rate = 0.5))
model.add(TimeDistributed(Dense(7, activation='softmax')))
sgd=SGD(lr=0.005, momentum=0.2)
model.compile(optimizer = 'adadelta', loss='categorical_crossentropy', metrics=['accuracy'], sample_weight_mode='temporal')
#fit model
history= model.fit(Xtrain, ytrain, batch_size=batch_size, epochs=epochs, validation_data=(Xdev, ydev),verbose=1, sample_weight=weights)
So the train data and its labels are all padded into matrices with the same length in each sample. Without mask_zero in embedding layer, the model predicts on the padding, therefore, the model prediction is very high on recall , and very low on precision, because it predicts on all the padded spots, where should not be predicted. In order to let it pass the padded spot, I tried to make the weight on the padded spots all zero, but it then works very bad, the accuracy becomes less than 3 percent. Any suggestions to make the model work?
Thanks !