Closed MaksTarnavskyi closed 3 years ago
Hey @MaksTarnavskyi Huge thanks for such a giant piece of work! I'm starting PR review.
Overall, the code looks good to me. But new tokenization doesn't work for our previous pretrained models. I got about F_0.5 28.11 for CoNNL-2014 (test) for our BERT model. At the same time, this tokenization is a more proper one (as mentioned in https://github.com/grammarly/gector/issues/50). We'll try to reproduce the pipeline with your codebase and we'll release this PR along with the new models.
We'll push in a temporary branch for now to add our updates.
The main changes:
bert-large
,roberta-large
,xlnet-large