grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
Apache License 2.0
891 stars 216 forks source link

with low precision and recall when training roberta and xlnet on stage2 #167

Closed xiuzhilu closed 1 year ago

xiuzhilu commented 1 year ago

I train roberta and xlnet with PIE Synthetically a1(train:8865347 dev:3000), The parameters of the first stage model are as you provided them in Git(https://github.com/grammarly/gector/blob/master/docs/training_parameters.md). The training parameter and I get the result as following: xlnet training parameter: TRAIN_SET=train/train_seq_tag.txt DEV_SET=dev/dev_seq_tag.txt MODEL_PATH=model/train/stage1_xlnet # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=xlnet # 变化 special_tokens_fix=0 #变化 n_epoch=20 cold_steps_count=2 accumulation_size=4 updates_per_epoch=10000 tn_prob=0 tp_prob=1 tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0

patience=0

patience=3 PYTHONIOENCODING=utf-8 python3 train.py --train_set $TRAIN_SET --dev_set $DEV_SET \ --model_dir $MODEL_PATH --vocab_path $VOCAB_PATH --transformer_model $transformer_model \ --special_tokens_fix $special_tokens_fix --n_epoch $n_epoch --cold_steps_count $cold_steps_count \ --accumulation_size $accumulation_size --updates_per_epoch $updates_per_epoch --tn_prob $tn_prob \ --tp_prob $tp_prob --tune_bert $tune_bert \ --skip_correct $skip_correct \ --skip_complex $skip_complex \ --max_len $max_len \ --batch_size $batch_size \ --cold_lr $cold_lr \ --lr $lr \ --predictor_dropout $predictor_dropout \ --lowercase_tokens $lowercase_tokens \ --pieces_per_token $pieces_per_token \ --label_smoothing $label_smoothing \ --patience $patience xlnet results:

Precision Recall F0.5
80.47% 74.47% 79.20%

roberta training parameter: TRAIN_SET=train/train_seq_tag.txt DEV_SET=dev/dev_seq_tag.txt MODEL_PATH=model/train/stage1_roberta # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=roberta # 变化 special_tokens_fix=1 #变化 n_epoch=20 cold_steps_count=2 accumulation_size=4 updates_per_epoch=10000 tn_prob=0 tp_prob=1 tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0

patience=0

patience=3

python3 train.py --train_set $TRAIN_SET --dev_set $DEV_SET \ --model_dir $MODEL_PATH --vocab_path $VOCAB_PATH --transformer_model $transformer_model \ --special_tokens_fix $special_tokens_fix --n_epoch $n_epoch --cold_steps_count $cold_steps_count \ --accumulation_size $accumulation_size --updates_per_epoch $updates_per_epoch --tn_prob $tn_prob \ --tp_prob $tp_prob --tune_bert $tune_bert \ --skip_correct $skip_correct \ --skip_complex $skip_complex \ --max_len $max_len \ --batch_size $batch_size \ --cold_lr $cold_lr \ --predictor_dropout $predictor_dropout \ --lowercase_tokens $lowercase_tokens \ --pieces_per_token $pieces_per_token \ --label_smoothing $label_smoothing \ --patience $patience roberta: Precision Recall F0.5
89.52% 66.57% 83.75%
I train xlnet and roberta on stage2 with NUCLE, Lang-8,FCE,Write & Improve + LOCNESS,all total train dataset is 1157038, dev dataset is 6573. I get the low precision and recall when I test the stage2 trained model. xlnet: Precision Recall F0.5
48.73% 20.06% 37.9%
roberta: Precision Recall F0.5
55.09% 9.77% 28.58%

I use the 8-GPUS when train stage1 and stage2 model

the parameter train xlnet and roberta on stage2 as following: xlnet training parameter: TRAIN_SET=train/stage2_train/all.train.seq_tag DEV_SET=dev/stage2/stage2dev.seq_tag MODEL_PATH=model/train/stage2_xlnet_all # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=xlnet # 变化 special_tokens_fix=0 #变化 n_epoch=9 #变化 cold_steps_count=2
accumulation_size=2

accumulation_size=1

updates_per_epoch=0
tn_prob=0
tp_prob=1
pretrain_folder=model/train/stage1_xlnet #变化 pretrain=best tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0

patience=0

patience=3

python3 train.py --train_set $TRAIN_SET --dev_set $DEV_SET \ --model_dir $MODEL_PATH --vocab_path $VOCAB_PATH --transformer_model $transformer_model \ --special_tokens_fix $special_tokens_fix --n_epoch $n_epoch --cold_steps_count $cold_steps_count \ --accumulation_size $accumulation_size --updates_per_epoch $updates_per_epoch --tn_prob $tn_prob \ --tp_prob $tp_prob --tune_bert $tune_bert \ --skip_correct $skip_correct \ --skip_complex $skip_complex \ --max_len $max_len \ --batch_size $batch_size \ --cold_lr $cold_lr \ --lr $lr \ --predictor_dropout $predictor_dropout \ --lowercase_tokens $lowercase_tokens \ --pieces_per_token $pieces_per_token \ --label_smoothing $label_smoothing \ --patience $patience

roberta training parameter: TRAIN_SET=train/stage2_train/all.train.seq_tag DEV_SET=dev/stage2/stage2dev.seq_tag MODEL_PATH=model/train/stage2_roberta_all # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=roberta # 变化 special_tokens_fix=1 #变化 n_epoch=10 #变化 cold_steps_count=2
accumulation_size=2
updates_per_epoch=0
tn_prob=0
tp_prob=1
pretrain_folder=model/train/stage1_roberta #变化 pretrain=best tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0

patience=0

patience=3

python3 train.py --train_set $TRAIN_SET --dev_set $DEV_SET \ --model_dir $MODEL_PATH --vocab_path $VOCAB_PATH --transformer_model $transformer_model \ --special_tokens_fix $special_tokens_fix --n_epoch $n_epoch --cold_steps_count $cold_steps_count \ --accumulation_size $accumulation_size --updates_per_epoch $updates_per_epoch --tn_prob $tn_prob \ --tp_prob $tp_prob --tune_bert $tune_bert \ --skip_correct $skip_correct \ --skip_complex $skip_complex \ --max_len $max_len \ --batch_size $batch_size \ --cold_lr $cold_lr \ --lr $lr \ --predictor_dropout $predictor_dropout \ --lowercase_tokens $lowercase_tokens \ --pieces_per_token $pieces_per_token \ --label_smoothing $label_smoothing \ --patience $patience could you tell me what went wrong? @skurzhanskyi @komelianchuk

skurzhanskyi commented 1 year ago

Not sure, tbh. This would require reviewing the whole training process.