jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks" #44

Open JamesSand opened 2 months ago

JamesSand commented 2 months ago

Thank you for your great work. I am trying to reproduce the results in "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

I have go through this issue, but I still fail to reproduce some of the results.

I conducted experiments on GaLore rank 4, with the following script

python run_glue.py \
    --model_name_or_path roberta-base \
    --task_name $task_name \
    --enable_galore \
    --lora_all_modules \
    --max_length 512 \
    --seed $seed \
    --lora_r 4 \
    --galore_scale 4 \
    --per_device_train_batch_size 16 \
    --update_proj_gap 500 \
    --learning_rate 1e-5 \
    --num_train_epochs 30 \
    --output_dir results/ft/roberta_base/$task_name

following the hyper parameters in Table 7 in your paper, I changed bs to 32 and learning rate to 3e-5 when conducting experiment on CoLA dataset. But I got the following results

Dataset Result in Paper Reproduced Results
MRPC 92.25 Acc 87.74, F1 91.10
COLA 60.35 matthews_correlation: 59.56
RTE 79.42 Acc 77.25
STSB 90.73 pearson 0.90526; spearmanr 0.90339
QQP 91.06 pearson 0.90785; spearmanr 0.90589

I am wondering if I have missed someting?