sadmankiba / Lowrank-Training

Low-rank model training (LTE and GaLore) with nanoGPT and ViT
0 stars 0 forks source link

Lowrank Training

Low-rank adaptation (LoRA) was originally proposed for fine-tuning by reparameterizing weight with its factorization of two lower-dimensional matrices A and B. In pre-training, low-rank adaptation does not achieve performance similar to full parameter training. Recent works have extended LoRA to improve its performance. LoRA-The-Explorer (LTE) suggested tranining multiple LoRA heads in parallel. GaLore projects gradients into lower-dimensional matrices and updates weights in a smaller subspace. In this project, we have applied LTE and GaLore methods in foundation models for language and vision tasks to validate their effectiveness. Our key results are as follows-

LTE tasks

Sources:

Initial setup (also see source repo):

  1. run python3 data/cnn_dailymail/prepare.py
  2. python3 train_gpt_lte.py

GaLore tasks

Sources:

LoRA Vision

Sources:

Dataset:

Source code:

Contributors

References

  1. LoRA: Low-rank adaptation of large language models. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, W. Chen, and T.-Y. Liu. arXiv preprint arXiv:2106.09685, 2021.
  2. Training neural networks from scratch with parallel low-rank adapters. M. Huh, B. Cheung, J. Bernstein, P. Isola, and P. Agrawal. arXiv preprint arXiv:2402.16828, 2024.
  3. Galore: Memory-efficient LLM training by gradient low-rank projection. J. Zhao, Z. Zhang, B. Chen, Z. Wang, A. Anandkumar, and Y. Tian. arXiv preprint arXiv:2403.03507, 2024.