jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

(Question) About glue tasks #52

Open ZhichaoWang091732 opened 3 weeks ago

ZhichaoWang091732 commented 3 weeks ago

Hello, thanks for your inspiring and excellent work!

I want to try full fine-tuning to compare with Galora, and I have blocked the use of Galora. However, I'm having some problems that when I try to run the glue task (i.e. mrpc) to full fine-tune roberta, I find that the eval acc doesn't change at all as the training progresses. I have ruled out a possible overfitting problem and I would like to ask the author or anyone else if there is a relevant solution.

image

jiaweizzhao commented 4 days ago

Hi, thanks for your question. Were you using the hyperparameters and settings provided by our paper (appendix)?