microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 216 forks source link

DeBERTa base different performance numbers #53

Closed DarshanPatel11 closed 3 years ago

DarshanPatel11 commented 3 years ago

Hi, While going through the research paper I found the performance number of the DeBERTa base model on the MNLI task at 2 different locations with different values. image image

And while I tried to reproduce the numbers by finetuning on MNLI task, I got these numbers. MNLI Matched - 86.8; MNLI mismatched - 86.3

My Hyperparameters: python run_glue.py \ --model_name_or_path $MODEL_NAME \ --task_name $TASK_NAME \ --do_train --do_eval \ --train_file $GLUE_DIR/$TASK_NAME/$train_file \ --validation_file $GLUE_DIR/$TASK_NAME/$validation_file \ --test_file $GLUE_DIR/$TASK_NAME/$test_file \ --max_seq_length 128 \ --per_device_train_batch_size 64 \ --per_device_eval_batch_size 128 \ --learning_rate 2e-5 \ --num_train_epochs 6.0 \ --output_dir $OUTPUT_DIR \ --logging_dir $LOG_DIR \ --logging_steps $logging_steps \ --save_total_limit 2 \ --save_steps 1000 \ --warmup_steps 100 \ --gradient_accumulation_steps 1 \ --overwrite_output_dir \ --evaluation_strategy epoch \

Can you provide some details on hyperparameters used by you while finetuning and which performance number to be considered?

BigBird01 commented 3 years ago

Please follow our scripts in experiments/glue for the results. The results in ablation study is the model trained with the same setting as bert, i.e. over 16G data. And the other is trained with ~80G data.