huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
3.02k stars 628 forks source link

tinyBERT Finetuning details for CoLA task #80

Open pizzamen opened 4 years ago

pizzamen commented 4 years ago

hi, huawei-noah team. i'm trying to reproducing the CoLA results from tiny BERT paper (dev-mcc: 49.7). however the best mcc on dev set i got is 0.3864631965818987. can you check if there's anything wrong in my workflow:

  1. i finetune bert-base-uncased from scratch as teacher with huggingface transformer code. teacher mcc on CoLA dev set is: 0.5833150512387887

  2. i download pretrained General_TinyBERT(4layer-312dim) model from this repo.

  3. i download glove embedding from here(Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB)

  4. i run DA with

    
    BERT_BASE_DIR=./bert-base-uncased/
    GLOVE_EMB=./glove.840B.300d.txt
    GLUE_DIR=./glue
    TASK_NAME=CoLA

python TinyBERT/data_augmentation.py --pretrained_bert_model ${BERT_BASE_DIR} \ --glove_embs ${GLOVE_EMB} \ --glue_dir ${GLUE_DIR} \ --task_name ${TASK_NAME}

after DA, i got 209034 train examples. the first 10 lines of augmented data is like:
```shell
$ head glue/CoLA/train_aug.tsv
gj04    1               Our friends won't buy this analysis, let alone the next one we propose.
gj04    1               our friends won ' t buy this analysis , let alone the next one we batted .
gj04    1               our trunk won ' t bowled this analysis , fry chesapeake the next governed we propose .
gj04    1               our limp won ' t buy this trunk , let alone the next armies we propose .
gj04    1               our friends won ' t buy this hooked , let alone the next one we 1086 .
gj04    1               our friends won ' t bowled this analysis , let alone the next platt we propose .
gj04    1               our band won ' t batted this analysis , let sugar the next bound we propose .
gj04    1               our friends won ' t buy this analysis , let alone the next one we propose .
gj04    1               our friends won ' t tits this analysis , flags alone the next one we 1086 .
gj04    1               our friends won ' t tits this hooked , let chesapeake the presided one we propose .
  1. i run Task-specific distillation with:
    
    FT_BERT_BASE_DIR=./hub/bert-base-uncased-cola/
    GENERAL_TINYBERT_DIR=./hub/tinybert/
    TASK_DIR=./glue/CoLA/
    TASK_NAME=cola
    TMP_TINYBERT_DIR=./tune_huawei_cola_e50
    TINYBERT_DIR="${TMP_TINYBERT_DIR}_td2"
    rm -rf $TMP_TINYBERT_DIR
    rm -rf $TINYBERT_DIR
    export PYTHONPATH=./TinyBERT/:$PYTHONPATH

python ./TinyBERT/task_distill.py --teacher_model ${FT_BERT_BASE_DIR} \ --student_model ${GENERAL_TINYBERT_DIR} \ --data_dir ${TASK_DIR} \ --task_name ${TASK_NAME} \ --output_dir ${TMP_TINYBERT_DIR} \ --max_seq_length 128 \ --train_batch_size 32 \ --num_train_epochs 50 \ --aug_train \ --do_lower_case \ &> log.$TASK_NAME-da.td1

python ./TinyBERT/task_distill.py --pred_distill \ --teacher_model ${FT_BERT_BASE_DIR} \ --student_model ${TMP_TINYBERT_DIR} \ --data_dir ${TASK_DIR} \ --task_name ${TASK_NAME} \ --output_dir ${TINYBERT_DIR} \ --do_lower_case \ --aug_train \ --learning_rate 3e-5 \ --num_train_epochs 3 \ --eval_step 100 \ --max_seq_length 128 \ --train_batch_size 32 \ &> log.$TASK_NAME-da.td2


the train log for 2step TD is at: [tinybert-cola-log.tar.gz](https://github.com/huawei-noah/Pretrained-Language-Model/files/5049950/tinybert-cola-log.tar.gz).

thanks for your help.
dongkuanx27 commented 4 years ago

Hi,

The same issue.

I got dev-mcc=37.87, where General_TinyBERT=4l-312dim and glove embedding is glove.42B.300d (it is dev-mcc=49.7 reported in the paper).

However, when I use General_TinyBERT=6l-768dim, I got dev-mcc=53.6 which is very similar to the reported 54.0.

Thanks for the help.

Best, DK