Open pizzamen opened 4 years ago
Hi,
The same issue.
I got dev-mcc=37.87, where General_TinyBERT=4l-312dim and glove embedding is glove.42B.300d (it is dev-mcc=49.7 reported in the paper).
However, when I use General_TinyBERT=6l-768dim, I got dev-mcc=53.6 which is very similar to the reported 54.0.
Thanks for the help.
Best, DK
hi, huawei-noah team. i'm trying to reproducing the CoLA results from tiny BERT paper (dev-mcc: 49.7). however the best mcc on dev set i got is
0.3864631965818987
. can you check if there's anything wrong in my workflow:i finetune
bert-base-uncased
from scratch as teacher with huggingface transformer code. teacher mcc on CoLA dev set is:0.5833150512387887
i download pretrained
General_TinyBERT(4layer-312dim)
model from this repo.i download glove embedding from here(
Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB
)i run DA with
python TinyBERT/data_augmentation.py --pretrained_bert_model ${BERT_BASE_DIR} \ --glove_embs ${GLOVE_EMB} \ --glue_dir ${GLUE_DIR} \ --task_name ${TASK_NAME}
python ./TinyBERT/task_distill.py --teacher_model ${FT_BERT_BASE_DIR} \ --student_model ${GENERAL_TINYBERT_DIR} \ --data_dir ${TASK_DIR} \ --task_name ${TASK_NAME} \ --output_dir ${TMP_TINYBERT_DIR} \ --max_seq_length 128 \ --train_batch_size 32 \ --num_train_epochs 50 \ --aug_train \ --do_lower_case \ &> log.$TASK_NAME-da.td1
python ./TinyBERT/task_distill.py --pred_distill \ --teacher_model ${FT_BERT_BASE_DIR} \ --student_model ${TMP_TINYBERT_DIR} \ --data_dir ${TASK_DIR} \ --task_name ${TASK_NAME} \ --output_dir ${TINYBERT_DIR} \ --do_lower_case \ --aug_train \ --learning_rate 3e-5 \ --num_train_epochs 3 \ --eval_step 100 \ --max_seq_length 128 \ --train_batch_size 32 \ &> log.$TASK_NAME-da.td2