Why can't I mask the first token?

kmkurn / pytorch-crf

(Linear-chain) Conditional random field in PyTorch.

https://pytorch-crf.readthedocs.io

MIT License

935 stars 151 forks source link

Why can't I mask the first token? #46

Closed wuyaoxuehun closed 4 years ago

kmkurn commented 4 years ago

To ensure the input has at least 1 token, so the start and end transition score can be computed properly. Masking the first token means the input has zero length, and my assumption is that such inputs are erroneous and should never happen. Is there a valid case where this assumption does not hold?

wuyaoxuehun commented 4 years ago

I'm trying to use bert with crf to so some NER tasks, and the first token is always [CLS], which is not to be classified. So I think this is a valid case. Thanks for replying.

kmkurn commented 4 years ago

Yes but that sequence is not empty, right? You have [CLS] then at least another token, don't you? What I would suggest is to remove the first token [CLS] before feeding it into the CRF.

wuyaoxuehun commented 4 years ago

Yes, that was what I did. I think this is a little inflexible, but your code is good enough. Thanks.

kmkurn commented 4 years ago

I see. I'm still reluctant to implement this because I think it'll make the code a lot messier, and I'm not sure if it's worth it. Removing [CLS] seems so much easier. So I'm closing this for now.