Hi, While going through the research paper I found the performance number of the DeBERTa base model on the MNLI task at 2 different locations with different values.
And while I tried to reproduce the numbers by finetuning on MNLI task, I got these numbers.
MNLI Matched - 86.8; MNLI mismatched - 86.3
Please follow our scripts in experiments/glue for the results.
The results in ablation study is the model trained with the same setting as bert, i.e. over 16G data. And the other is trained with ~80G data.
Hi, While going through the research paper I found the performance number of the DeBERTa base model on the MNLI task at 2 different locations with different values.
![image](https://user-images.githubusercontent.com/26226718/115825547-34b6b100-a427-11eb-9a85-29ae6185546f.png)
And while I tried to reproduce the numbers by finetuning on MNLI task, I got these numbers. MNLI Matched - 86.8; MNLI mismatched - 86.3
My Hyperparameters: python run_glue.py \ --model_name_or_path $MODEL_NAME \ --task_name $TASK_NAME \ --do_train --do_eval \ --train_file $GLUE_DIR/$TASK_NAME/$train_file \ --validation_file $GLUE_DIR/$TASK_NAME/$validation_file \ --test_file $GLUE_DIR/$TASK_NAME/$test_file \ --max_seq_length 128 \ --per_device_train_batch_size 64 \ --per_device_eval_batch_size 128 \ --learning_rate 2e-5 \ --num_train_epochs 6.0 \ --output_dir $OUTPUT_DIR \ --logging_dir $LOG_DIR \ --logging_steps $logging_steps \ --save_total_limit 2 \ --save_steps 1000 \ --warmup_steps 100 \ --gradient_accumulation_steps 1 \ --overwrite_output_dir \ --evaluation_strategy epoch \
Can you provide some details on hyperparameters used by you while finetuning and which performance number to be considered?