yxli2123 / LoftQ

MIT License
180 stars 15 forks source link

Failing to converge when using some random seeds #24

Open Car-pe opened 2 months ago

Car-pe commented 2 months ago

Dear Authors,

Your work is truly exceptional and I am currently attempting to reproduce it. However, I've observed noticeable performance variations when employing different random seeds. For example, during the fine-tuning of Deberta-v3-base on the 'mrpc' task, setting the random seed to '0' results in an evaluation accuracy of 85.05. In contrast, when I choose '71' or '37' as the random seed, the evaluation accuracy significantly drops to 68.38, essentially failing to converge.

Could you possibly offer any guidance regarding this matter? Moreover, I would greatly appreciate it if you could disclose the random seeds you utilized in this work.

Thank you!

yifan1130 commented 2 months ago

Thank you for reaching out and for your kind words about our work. Regarding 2-bit quantization for GLUE task, there are sometimes performance variations in different random seeds. The extent of uncertainties varies for different tasks. For instance, as the mrpc task as you noticed, and also tasks like CoLA, the performance is rather unstable. In most cases, the situation would be even worse using the baseline QLoRA method. In our experiments, we attempted a variety of random seeds and excluded the results that didn't converge. To achieve a more stable performance on GLUE, you can consider using a larger batch size or increasing the precisions, such as using 4-bit precision in precedent layers and 2-bit precision in later layers. There are some ckpts available now. I will try to provide more ckpts and random seeds in the future.

Car-pe commented 2 months ago

Thank you for your response. Could you please tell me your random seeds used in GLUE benchmark? I can not reproduce your results claimed in the paper. Thank you!