Open JamesSand opened 2 months ago
Thank you for your great work. I am trying to reproduce the results in "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"
I have go through this issue, but I still fail to reproduce some of the results.
I conducted experiments on GaLore rank 4, with the following script
python run_glue.py \ --model_name_or_path roberta-base \ --task_name $task_name \ --enable_galore \ --lora_all_modules \ --max_length 512 \ --seed $seed \ --lora_r 4 \ --galore_scale 4 \ --per_device_train_batch_size 16 \ --update_proj_gap 500 \ --learning_rate 1e-5 \ --num_train_epochs 30 \ --output_dir results/ft/roberta_base/$task_name
following the hyper parameters in Table 7 in your paper, I changed bs to 32 and learning rate to 3e-5 when conducting experiment on CoLA dataset. But I got the following results
I am wondering if I have missed someting?
Thank you for your great work. I am trying to reproduce the results in "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"
I have go through this issue, but I still fail to reproduce some of the results.
I conducted experiments on GaLore rank 4, with the following script
following the hyper parameters in Table 7 in your paper, I changed bs to 32 and learning rate to 3e-5 when conducting experiment on CoLA dataset. But I got the following results
I am wondering if I have missed someting?