Open manliu1225 opened 4 years ago
I found https://github.com/liuyukid/transformers-ner/blob/master/models/bert_ner.py#L110-L111
That the author change those labels with -100 to 0. And use attention mask as the mask for crf. However, this will add those tokens that are not the first token of the word, and all these tokens have their label = 0. I think it is noise.
Hi, here is a issue that if I use the original parameters, the performance is quite low. After I increased the epochs to 15, the F1 of CONLL03 is only 77%.