Open SparkJiao opened 4 months ago
had the same problem.... Any solutions? @jiaweizzhao
Thanks for your interest. We are getting in touch with FSDP team and will update it soon.
Hi! Up on this, is this feature currently in development?
Yes, this feature is still in development. Please stay tuned!
Hi, appreciate to your awesome work!
When I trying to introduce GaLore AdamW optimizer to Gemma training, it seems that it is not compatible with deepspeed with Zero stage as both 0 and 1:![image](https://github.com/jiaweizzhao/GaLore/assets/16469472/adfdbf81-4c09-4726-9bc7-cd60fafa72ee)
I guess this is because DeepSpeed's BF16_Optimizer will flatten the parameters for memory efficiency. Perhaps this will also affect the usage of FSDP.