Hi thanks for the sharing of your code. In the paper, you implemented a baseline called "BERT+MLP", reaching a 76.2 F1 score. But when I use the same architecture, I cannot get the same result. Did I miss something or did you have some strategies in preprocessing or training?
Hi thanks for the sharing of your code. In the paper, you implemented a baseline called "BERT+MLP", reaching a 76.2 F1 score. But when I use the same architecture, I cannot get the same result. Did I miss something or did you have some strategies in preprocessing or training?