jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Apache License 2.0
1.24k stars 131 forks source link

How many GB memory is required to train the 7b model using DDP mode with galore? #40

Open zhangqijun opened 2 months ago

zhangqijun commented 2 months ago

in sigle gpu mode,I success run the train by RTX3090.but it took too long。 in ddp mode,we got OOM in LlamaForCausalLM = torch.nn.parallel.DistributedDataParallel( model, device_ids=[local_rank], output_device=local_rank, broadcast_buffers=False, ) .

xinpengzz commented 1 month ago

same issue. I found that it seems related with the initialization of the DDP.