huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.79k stars 26.96k forks source link

High accuracy for CoLA task #121

Closed pfecht closed 5 years ago

pfecht commented 5 years ago

I try to reproduce the CoLA results from the BERT paper (BERTBase, Single GPU).

Running the following command

python run_classifier.py \
  --task_name cola \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir $GLUE_DIR/CoLA/ \
  --bert_model bert-base-uncased \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir $OUT_DIR/cola_output/

I get eval results of

12/16/2018 12:31:34 - INFO - __main__ -   ***** Eval results *****
12/16/2018 12:31:34 - INFO - __main__ -     eval_accuracy = 0.8302972195589645
12/16/2018 12:31:34 - INFO - __main__ -     eval_loss = 0.5117322660925734
12/16/2018 12:31:34 - INFO - __main__ -     global_step = 804
12/16/2018 12:31:34 - INFO - __main__ -     loss = 0.17348005173644468

An accuracy of 0.83 would be fantastic, but compared to the 0.521 stated in the paper this doesn't seem very realistic.

Any suggestions what I'm doing wrong?

davidefiocco commented 5 years ago

The metric used for evaluation of CoLA in the GLUE benchmark is not accuracy but the https://en.wikipedia.org/wiki/Matthews_correlation_coefficient (see https://gluebenchmark.com/tasks). Indeed authors report in https://arxiv.org/abs/1810.04805 0.521 for Matthews correlation with BERT-base.

pfecht commented 5 years ago

Makes sense, looks like I missed that point. Thank you.