jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.97k stars 469 forks source link

Why FSDP not DPP? #173

Open noforit opened 8 months ago

noforit commented 8 months ago

Could I kindly inquire as to why, given the relatively small size of the tinyllama model, the Strategy was made to utilize FSDP (Fully Sharded Data Parallel) instead of DDP (Distributed Data Parallel)? image

Hannibal046 commented 3 months ago

Hi, I also want to know why FSDP is preferred here since a 1.1B model could fit into a A100-40G with bsz=1.