keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.94k stars 19.46k forks source link

Support for `GaLore` for Memory-Efficient LLM Training #19338

Closed awsaf49 closed 7 months ago

awsaf49 commented 7 months ago

Recently GaLore has been released which can be used for memory efficient fine-tuning of llms. According to the paper,

Gradient Low-Rank Projection (GaLore) is a memory-efficient low-rank training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods, such as LoRA. As a gradient projection method, GaLore is independent of the choice of optimizers and can be easily plugged into existing ones with only two lines of code, as shown in Algorithm 1 below.

galore_code_box

I think it would be nice to have this feature in Keras 3.

Reference

nkovela1 commented 7 months ago

Hi @awsaf49, since GaLore is relatively new, we will likely wait on integrating this into the core API, but you are welcome to prototype this as a subclass of EinsumDense or Dense layer and keep us updated! Thanks.