Open haozhouamzn opened 11 months ago
Hi,
You do not need to use fsdp for mezo multi-gpu, since mezo only requires model inference. You should be able to directly run it with mezo.sh (same as what the README instructed for single GPU without any code/script change). Just make sure there are 2 available GPUs.
Thanks, yes, MeZO works out-of-box.
How about first-order prefix FT (Prefix FT column in table 20)? The results on 13B, 30B, and 66B used FSDP, right?
Yes, and you should be able to run them via the following command (from readme):
# Full-parameter fine-tuning using fully-sharded data parallel or FSDP (multi-GPU)
MODEL=facebook/opt-13b TASK=SST2 MODE=ft LR=1e-5 NUM_GPU=4 bash finetune_fsdp.sh
You can change the MODE to prefix
or lora
Hi, in table 20, it shows prefix FT with 2 and 4 GPUs. How are those obtained? I tried using
MODEL=facebook/opt-13b TASK=SST2 MODE=prefix LR=1e-5 NUM_GPU=8 bash finetune_fsdp.sh
, but got some errors.