bi-LSTM with CRF not training properly

christios commented 3 years ago

Hi! I'm trying to use your CRF implementation on top of a bi-LSTM of mine. Without adding the CRF layer, the model is working fine and learning as it should. Although when I add it, I get a really big negative loss, and decode just outputs my padding value in all indices. I am inputting the output of the LSTM (which I use to calculate loss when the CRF is not used and it's working fine) to the CRF, without any softmax. The shape of the LSTM features and tags is (seq_len, batch_size, num_tags) with num_tags being my two tags + the padding index. The shape of the mask is (seq_len, batch_size). Using Adam optimizer with default settings. I don't know what I am doing wrong, is it something that has to do with the masking/padding? Also, I just checked my masking and it seems to be fine.

Any help with this?

Thanks!

kmkurn commented 3 years ago

@christios Hi, you closed this issue so I assume you've fixed the problem? If so, do you mind explaining how you solved it? It might be useful for others 😄

christios commented 3 years ago

Hi, yes sorry I meant to do that actually 😁. I had to make the loss negative (loss = - crf(input, tags, mask)) since it's log likelihood. But you had already pointed that out in the documentation 😁 Thanks for the implementation!

kmkurn / pytorch-crf

bi-LSTM with CRF not training properly #81