Closed pfecht closed 5 years ago
The metric used for evaluation of CoLA in the GLUE benchmark is not accuracy but the https://en.wikipedia.org/wiki/Matthews_correlation_coefficient (see https://gluebenchmark.com/tasks). Indeed authors report in https://arxiv.org/abs/1810.04805 0.521 for Matthews correlation with BERT-base.
Makes sense, looks like I missed that point. Thank you.
I try to reproduce the CoLA results from the BERT paper (BERTBase, Single GPU).
Running the following command
I get eval results of
An accuracy of 0.83 would be fantastic, but compared to the 0.521 stated in the paper this doesn't seem very realistic.
Any suggestions what I'm doing wrong?