keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.92k stars 19.45k forks source link

LSTM for part-of-speech tagging #4196

Closed danche354 closed 7 years ago

danche354 commented 7 years ago

When I use keras LSTM for part-of-speech tagging, I used mask_value=0 to mask the 0 value timesteps in the input, and it worked because the output prob are all same in the input end, for example, input is [1,2,3,4,0,0,0,0], and output is [0.2,0.4,0.5,0.7,0.7,0.7,0.7,0.7], this is just for example... But the train accuracy and train loss are calculated containing the masking timesteps, leading that the verbose log is not useful and correct. Furthermore, will this incorrect loss influence the backpropagation algorithm? How can I fixed this? Thanks!

dieuwkehupkes commented 7 years ago

I think this problem is similar to the one that I had before (described in #3855). I think in my case the loss was in fact computed correctly though, but the accuracy wasn't. Could you give an example?

danche354 commented 7 years ago

@dieuwkehupkes What you thought was right!No matter how I changed the padding number, the loss always keeps same, but accuracy not, but the loss is the key where influence the backpropagation. So I defined my own function to compute the accuracy and the result is good!

dieuwkehupkes commented 7 years ago

@danche354 Nice! Would you mind sharing this function with me? In the end I found a different workaround but it is not very generally applicable. Thanks!

danche354 commented 7 years ago

@dieuwkehupkes Sure, I would love to. But I just looked your issue, we are in different tasks. My model was many-to-many(POS tagging), the output and input length was same...but the seq2seq task is not. What I have done was train the data on batch, and use the train data labels' length to ignore the padding values and calculate accuracy again... like following:

# load the test data, length represent the label length
X_test, y_test, length = load_data('test_set')
output = model.predict(X_test)
# function
y_test_true = y_test[:length]
output_true = output[:length]
# calculate the accuracy
...
dieuwkehupkes commented 7 years ago

Ahh yes that does indeed not work in my case. But thanks anyways!

Shuailong commented 7 years ago

@danche354 I have met the same problem when I was doing the sequence labelling task. The problem is, how do you define your accuracy function when you need the extra lengths info? The keras metrics function has a signature of

def accuracy(y_true, y_pred):

What's more, the data are shuffled after each epoch, making it harder to apply the length info.

danche354 commented 7 years ago

@Shuailong Hi, I tried use the keras metric function, but then I realize the accuracy can't be done that way.

So I define my own using train_on_batch and predict_on_bath to train the model.

Though shuffled data are not same, but you can return you data length every time you call the prepare_data function.

Shuailong commented 7 years ago

@danche354 Thanks for your reply! Hope Keras will resolve this issue in the future. For me, I just don't do padding, and use batch size of one.

vsoto commented 7 years ago

Isn't the solution for this to try sample_weights and set them to zero for those samples that we are setting to mask_value=0?

hankcs commented 5 years ago

Has this issue been solved?