Closed christios closed 3 years ago
@christios Hi, you closed this issue so I assume you've fixed the problem? If so, do you mind explaining how you solved it? It might be useful for others 😄
Hi, yes sorry I meant to do that actually 😁. I had to make the loss negative (loss = - crf(input, tags, mask)
) since it's log likelihood. But you had already pointed that out in the documentation 😁
Thanks for the implementation!
Hi! I'm trying to use your CRF implementation on top of a bi-LSTM of mine. Without adding the CRF layer, the model is working fine and learning as it should. Although when I add it, I get a really big negative loss, and decode just outputs my padding value in all indices. I am inputting the output of the LSTM (which I use to calculate loss when the CRF is not used and it's working fine) to the CRF, without any softmax. The shape of the LSTM features and tags is
(seq_len, batch_size, num_tags)
withnum_tags
being my two tags + the padding index. The shape of the mask is(seq_len, batch_size)
. Using Adam optimizer with default settings. I don't know what I am doing wrong, is it something that has to do with the masking/padding? Also, I just checked my masking and it seems to be fine.Any help with this?
Thanks!