[FEATURE]: Integrate GaLore into Colossalai Optimizer(Gemini/Hybrid)

ericxsun commented 7 months ago

Describe the feature

A recent paper titled "GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection" (https://arxiv.org/pdf/2403.03507.pdf) demonstrates a remarkable memory-efficient approach during the training of large language models (LLMs).

Can we integrate this memory-efficient technique into the Colossalai framework?

FYI

GaLore Adamw: https://github.com/jiaweizzhao/GaLore/blob/master/galore_torch/adamw.py
8bit-GaLore Adamw: https://github.com/jiaweizzhao/GaLore/blob/master/galore_torch/adamw8bit.py

ericxsun commented 6 months ago

Any ColossalAI-er could take a look?

ver217 commented 6 months ago

Thanks! We will take a look.

ericxsun commented 6 months ago

I see the MR, that's awesome, when can we use it?

Edenzzzz commented 6 months ago

I plan to release it next week

hpcaitech / ColossalAI

[FEATURE]: Integrate GaLore into Colossalai Optimizer(Gemini/Hybrid) #5443

Describe the feature