jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.43k stars 148 forks source link

Dataset loading issue, integration with Colossal-AI #33

Open Edenzzzz opened 7 months ago

Edenzzzz commented 7 months ago

Hi, Thanks for the good work. I'm trying to intergrate this into Colossal-AI(https://github.com/hpcaitech/ColossalAI), compatible with Tensor Parallel and ZeRO. However, I had trouble loading the dataset; seems they updated the dataset to remove the json schema. Could you share your dataset version and how you're able to load it? Thanks!

image
jiaweizzhao commented 7 months ago

Thanks for the integration. I just tried again using the latest datasets version and it worked smoothly from my end. Is it possible due to other issue?

Edenzzzz commented 7 months ago

Thanks for replying. I think they've fixed the data and I can load it now.

Edenzzzz commented 7 months ago

Also, any ideas from integration with FSDP on projecting sharded gradients that are flattened and not necessarily reshapable into a matrix? We can all-gather grads before the SVD step, but doing that in every step will be prohibitively expensive. We can't project flat vectors. Thanks!