tinyBERT Finetuning details for CoLA task

hi, huawei-noah team. i'm trying to reproducing the CoLA results from tiny BERT paper (dev-mcc: 49.7). however the best mcc on dev set i got is 0.3864631965818987. can you check if there's anything wrong in my workflow:

i finetune bert-base-uncased from scratch as teacher with huggingface transformer code. teacher mcc on CoLA dev set is: 0.5833150512387887
i download pretrained General_TinyBERT(4layer-312dim) model from this repo.
i download glove embedding from here(Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB)

i run DA with


BERT_BASE_DIR=./bert-base-uncased/
GLOVE_EMB=./glove.840B.300d.txt
GLUE_DIR=./glue
TASK_NAME=CoLA

python TinyBERT/data_augmentation.py --pretrained_bert_model ${BERT_BASE_DIR} \ --glove_embs ${GLOVE_EMB} \ --glue_dir ${GLUE_DIR} \ --task_name ${TASK_NAME}

after DA, i got 209034 train examples. the first 10 lines of augmented data is like:
```shell
$ head glue/CoLA/train_aug.tsv
gj04    1               Our friends won't buy this analysis, let alone the next one we propose.
gj04    1               our friends won ' t buy this analysis , let alone the next one we batted .
gj04    1               our trunk won ' t bowled this analysis , fry chesapeake the next governed we propose .
gj04    1               our limp won ' t buy this trunk , let alone the next armies we propose .
gj04    1               our friends won ' t buy this hooked , let alone the next one we 1086 .
gj04    1               our friends won ' t bowled this analysis , let alone the next platt we propose .
gj04    1               our band won ' t batted this analysis , let sugar the next bound we propose .
gj04    1               our friends won ' t buy this analysis , let alone the next one we propose .
gj04    1               our friends won ' t tits this analysis , flags alone the next one we 1086 .
gj04    1               our friends won ' t tits this hooked , let chesapeake the presided one we propose .

i run Task-specific distillation with:


FT_BERT_BASE_DIR=./hub/bert-base-uncased-cola/
GENERAL_TINYBERT_DIR=./hub/tinybert/
TASK_DIR=./glue/CoLA/
TASK_NAME=cola
TMP_TINYBERT_DIR=./tune_huawei_cola_e50
TINYBERT_DIR="${TMP_TINYBERT_DIR}_td2"
rm -rf $TMP_TINYBERT_DIR
rm -rf $TINYBERT_DIR
export PYTHONPATH=./TinyBERT/:$PYTHONPATH

python ./TinyBERT/task_distill.py --teacher_model ${FT_BERT_BASE_DIR} \ --student_model ${GENERAL_TINYBERT_DIR} \ --data_dir ${TASK_DIR} \ --task_name ${TASK_NAME} \ --output_dir ${TMP_TINYBERT_DIR} \ --max_seq_length 128 \ --train_batch_size 32 \ --num_train_epochs 50 \ --aug_train \ --do_lower_case \ &> log.$TASK_NAME-da.td1

python ./TinyBERT/task_distill.py --pred_distill \ --teacher_model ${FT_BERT_BASE_DIR} \ --student_model ${TMP_TINYBERT_DIR} \ --data_dir ${TASK_DIR} \ --task_name ${TASK_NAME} \ --output_dir ${TINYBERT_DIR} \ --do_lower_case \ --aug_train \ --learning_rate 3e-5 \ --num_train_epochs 3 \ --eval_step 100 \ --max_seq_length 128 \ --train_batch_size 32 \ &> log.$TASK_NAME-da.td2


the train log for 2step TD is at: [tinybert-cola-log.tar.gz](https://github.com/huawei-noah/Pretrained-Language-Model/files/5049950/tinybert-cola-log.tar.gz).

thanks for your help.

huawei-noah / Pretrained-Language-Model

tinyBERT Finetuning details for CoLA task #80