kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.
https://pytorch-crf.readthedocs.io
MIT License
935 stars 151 forks source link

[Question] Actual usage examples? #29

Open Maghoumi opened 5 years ago

Maghoumi commented 5 years ago

Besides the toy examples listed in the docs and tests, are there actual examples of this library available anywhere?

I'm interested in using this library for a sequence labeling project, but I'm curious to know if I'm using this library correctly. What I have is something like this:

class MyModel(nn.Module):
    def __init__(self, num_features, num_classes):
        super(MyModel, self).__init__()
        self.num_features = num_features
        self.num_classes = num_classes
        self.lstm = nn.LSTM(num_features, 128)
        self.fc = nn.Linear(128, num_classes)
        self.crf = CRF(num_classes)

# ----------------------------------------------------------
model = MyModel(...)

# Training loop:
y_hat = model(batch)  # The network's forward returns fc(lstm(batch))
loss = -model.crf(y_hat, y)
loss.backward()
optimizer.step()

Although this seems to work and the loss is decreasing, I have a feeling that I might be missing something. Any help is appreciated. Thanks!

kmkurn commented 5 years ago

Hi,

Your usage seems alright. The examples are meant to show how to use the CRF layer given that one has produced the emission scores, i.e. (unnormalized) log P(y_t | X) where y_t is the tag at position t and X is the input sentence. In your code, y_hat would have a shape of (seq_length, batch_size, num_classes) where each y_hat[i, j, k] contains the score of the j-th example in the batch having tag k in the i-th position, which is as expected. I'll consider adding a more complete example in the docs. Thanks for the suggestion!

Maghoumi commented 5 years ago

Thanks for your response. What was confusing to me originally was the fact that your CRF layer is actually a loss that one can minimize, whereas other PyTorch implementations had a separately-defined Viterbi loss module.

Yes, dimensions that you mentioned coincide with what I have in my code. After reading your explanation, I think the only change needed in my pseudo-code above would be to change loss = -model.crf(y_hat, y) to loss = -model.crf(y_hat.log_softmax(2), y) (given that the output of the FC layer is directly returned from the network, but we need emission scores).

kmkurn commented 5 years ago

What was confusing to me originally was the fact that your CRF layer is actually a loss that one can minimize, whereas other PyTorch implementations had a separately-defined Viterbi loss module.

Actually, this is something that I think about every now and then. Right now the forward method returns the loss, which does not really fit to other patterns where forward returns some kind of prediction and the loss object is the one computing the loss from the given prediction and gold target, as you mentioned. It may be helpful to provide such loss class for those who are more comfortable with this pattern such as yourself. Thanks for bringing this up.

would be to change loss = -model.crf(y_hat, y) to loss = -model.crf(y_hat.log_softmax(2), y)

You don't have to. The CRF layer accepts unnormalized emission scores just fine. It'll normalize the score of y over all possible sequence of tags.

Maghoumi commented 5 years ago

It may be helpful to provide such loss class for those who are more comfortable with this pattern such as yourself. Thanks for bringing this up.

No problem! That would be an excellent idea. Also thanks for clarification regarding the log_softmax() bit.

Feel free to close this issue, or keep it open as a reminder if you decide to incorporate more examples and also change the library such that loss is separate from the CRF's output. I'd personally be very happy if the changes/examples are added.
Also, thanks for the great library!

Huijun-Cui commented 5 years ago

Hi, I got some problem I download the test_crf.py which you provide some examples. then in the final line I add following codes

if __name__ == "__main__":
    m = TestDecode()

    m.test_batched_decode()

first error come as CRF model has no the attribute batch_first , I solve it manually by ‘ CRF.batch_first = False’

then when I run the above code , the error come as

*** IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

I am not sure if it is because my torch version is 1.1.0 , by the way , which torch version do you recommend?

Huijun-Cui commented 5 years ago

I have solve it , when the torch version is <= 1.0.0 , then no error

kmkurn commented 5 years ago

@Huijun-Cui Thanks for letting me know. Next time please open a separate issue.