Closed ljch2018 closed 5 years ago
I use the same run command like yours, but I get worse results on dev dataset.
eval_f = 0.89656204 eval_precision = 0.90508 eval_recall = 0.88843685 global_step = 653 loss = 17.190592
I use "BERT-Base, Multilingual Cased: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters" as checkpoint, which is public by google at November 23rd, 2018.
When I use "BERT-Base, Cased: 12-layer, 768-hidden, 12-heads , 110M parameters" as checkpoint, F1 reachs 0.93 too.
I use the same run command like yours, but I get worse results on dev dataset.
I use "BERT-Base, Multilingual Cased: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters" as checkpoint, which is public by google at November 23rd, 2018.