Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Thank you for your great work. I am trying to reproduce the results in "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

I have go through this issue, but I still fail to reproduce some of the results.

I conducted experiments on GaLore rank 4, with the following script

python run_glue.py \
    --model_name_or_path roberta-base \
    --task_name $task_name \
    --enable_galore \
    --lora_all_modules \
    --max_length 512 \
    --seed $seed \
    --lora_r 4 \
    --galore_scale 4 \
    --per_device_train_batch_size 16 \
    --update_proj_gap 500 \
    --learning_rate 1e-5 \
    --num_train_epochs 30 \
    --output_dir results/ft/roberta_base/$task_name

following the hyper parameters in Table 7 in your paper, I changed bs to 32 and learning rate to 3e-5 when conducting experiment on CoLA dataset. But I got the following results

Dataset	Result in Paper	Reproduced Results
MRPC	92.25	Acc 87.74, F1 91.10
COLA	60.35	matthews_correlation: 59.56
RTE	79.42	Acc 77.25
STSB	90.73	pearson 0.90526; spearmanr 0.90339
QQP	91.06	pearson 0.90785; spearmanr 0.90589

I am wondering if I have missed someting?

jiaweizzhao / GaLore

Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks" #44