I train roberta and xlnet with PIE Synthetically a1(train:8865347 dev:3000), The parameters of the first stage model are as you provided them in Git(https://github.com/grammarly/gector/blob/master/docs/training_parameters.md). The training parameter and I get the result as following:
xlnet training parameter:
TRAIN_SET=train/train_seq_tag.txt
DEV_SET=dev/dev_seq_tag.txt
MODEL_PATH=model/train/stage1_xlnet # 变化
VOCAB_PATH=data/output_vocabulary
transformer_model=xlnet # 变化
special_tokens_fix=0 #变化
n_epoch=20
cold_steps_count=2
accumulation_size=4
updates_per_epoch=10000
tn_prob=0
tp_prob=1
tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64
cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0
I train xlnet and roberta on stage2 with NUCLE, Lang-8,FCE,Write & Improve + LOCNESS,all total train dataset is 1157038, dev dataset is 6573. I get the low precision and recall when I test the stage2 trained model.
xlnet:
Precision
Recall
F0.5
48.73%
20.06%
37.9%
roberta:
Precision
Recall
F0.5
55.09%
9.77%
28.58%
I use the 8-GPUS when train stage1 and stage2 model
the parameter train xlnet and roberta on stage2 as following:
xlnet training parameter:
TRAIN_SET=train/stage2_train/all.train.seq_tag
DEV_SET=dev/stage2/stage2dev.seq_tag
MODEL_PATH=model/train/stage2_xlnet_all # 变化
VOCAB_PATH=data/output_vocabulary
transformer_model=xlnet # 变化
special_tokens_fix=0 #变化
n_epoch=9 #变化
cold_steps_count=2
accumulation_size=2
I train roberta and xlnet with PIE Synthetically a1(train:8865347 dev:3000), The parameters of the first stage model are as you provided them in Git(https://github.com/grammarly/gector/blob/master/docs/training_parameters.md). The training parameter and I get the result as following: xlnet training parameter: TRAIN_SET=train/train_seq_tag.txt DEV_SET=dev/dev_seq_tag.txt MODEL_PATH=model/train/stage1_xlnet # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=xlnet # 变化 special_tokens_fix=0 #变化 n_epoch=20 cold_steps_count=2 accumulation_size=4 updates_per_epoch=10000 tn_prob=0 tp_prob=1 tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0
patience=0
patience=3 PYTHONIOENCODING=utf-8 python3 train.py --train_set $TRAIN_SET --dev_set $DEV_SET \ --model_dir $MODEL_PATH --vocab_path $VOCAB_PATH --transformer_model $transformer_model \ --special_tokens_fix $special_tokens_fix --n_epoch $n_epoch --cold_steps_count $cold_steps_count \ --accumulation_size $accumulation_size --updates_per_epoch $updates_per_epoch --tn_prob $tn_prob \ --tp_prob $tp_prob --tune_bert $tune_bert \ --skip_correct $skip_correct \ --skip_complex $skip_complex \ --max_len $max_len \ --batch_size $batch_size \ --cold_lr $cold_lr \ --lr $lr \ --predictor_dropout $predictor_dropout \ --lowercase_tokens $lowercase_tokens \ --pieces_per_token $pieces_per_token \ --label_smoothing $label_smoothing \ --patience $patience xlnet results:
roberta training parameter: TRAIN_SET=train/train_seq_tag.txt DEV_SET=dev/dev_seq_tag.txt MODEL_PATH=model/train/stage1_roberta # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=roberta # 变化 special_tokens_fix=1 #变化 n_epoch=20 cold_steps_count=2 accumulation_size=4 updates_per_epoch=10000 tn_prob=0 tp_prob=1 tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0
patience=0
patience=3
I use the 8-GPUS when train stage1 and stage2 model
the parameter train xlnet and roberta on stage2 as following: xlnet training parameter: TRAIN_SET=train/stage2_train/all.train.seq_tag DEV_SET=dev/stage2/stage2dev.seq_tag MODEL_PATH=model/train/stage2_xlnet_all # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=xlnet # 变化 special_tokens_fix=0 #变化 n_epoch=9 #变化 cold_steps_count=2
accumulation_size=2
accumulation_size=1
updates_per_epoch=0
tn_prob=0
tp_prob=1
pretrain_folder=model/train/stage1_xlnet #变化 pretrain=best tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0
patience=0
patience=3
python3 train.py --train_set $TRAIN_SET --dev_set $DEV_SET \ --model_dir $MODEL_PATH --vocab_path $VOCAB_PATH --transformer_model $transformer_model \ --special_tokens_fix $special_tokens_fix --n_epoch $n_epoch --cold_steps_count $cold_steps_count \ --accumulation_size $accumulation_size --updates_per_epoch $updates_per_epoch --tn_prob $tn_prob \ --tp_prob $tp_prob --tune_bert $tune_bert \ --skip_correct $skip_correct \ --skip_complex $skip_complex \ --max_len $max_len \ --batch_size $batch_size \ --cold_lr $cold_lr \ --lr $lr \ --predictor_dropout $predictor_dropout \ --lowercase_tokens $lowercase_tokens \ --pieces_per_token $pieces_per_token \ --label_smoothing $label_smoothing \ --patience $patience
roberta training parameter: TRAIN_SET=train/stage2_train/all.train.seq_tag DEV_SET=dev/stage2/stage2dev.seq_tag MODEL_PATH=model/train/stage2_roberta_all # 变化 VOCAB_PATH=data/output_vocabulary transformer_model=roberta # 变化 special_tokens_fix=1 #变化 n_epoch=10 #变化 cold_steps_count=2
accumulation_size=2
updates_per_epoch=0
tn_prob=0
tp_prob=1
pretrain_folder=model/train/stage1_roberta #变化 pretrain=best tune_bert=1
skip_correct=1
skip_complex=0
max_len=50
batch_size=64 cold_lr=1e-3
lr=1e-5
predictor_dropout=0.0
lowercase_tokens=0
pieces_per_token=5
label_smoothing=0.0
patience=0
patience=3
python3 train.py --train_set $TRAIN_SET --dev_set $DEV_SET \ --model_dir $MODEL_PATH --vocab_path $VOCAB_PATH --transformer_model $transformer_model \ --special_tokens_fix $special_tokens_fix --n_epoch $n_epoch --cold_steps_count $cold_steps_count \ --accumulation_size $accumulation_size --updates_per_epoch $updates_per_epoch --tn_prob $tn_prob \ --tp_prob $tp_prob --tune_bert $tune_bert \ --skip_correct $skip_correct \ --skip_complex $skip_complex \ --max_len $max_len \ --batch_size $batch_size \ --cold_lr $cold_lr \ --lr $lr \ --predictor_dropout $predictor_dropout \ --lowercase_tokens $lowercase_tokens \ --pieces_per_token $pieces_per_token \ --label_smoothing $label_smoothing \ --patience $patience could you tell me what went wrong? @skurzhanskyi @komelianchuk