Open g-h-chen opened 10 months ago
Hi there, Thanks for your great work! I am wondering what optimization strategy is used for finetuning the model. Since the gradient checkpointing is NOT implemented by modeling_phi.py, finetuning it raises OOM error even when using ZERO-3.
Thanks in advance!
Thanks for your interest in our project. Which stages are you referring? sft on phi-2 or fine-tuning stage for llava-phi?
Hi there, Thanks for your great work! I am wondering what optimization strategy is used for finetuning the model. Since the gradient checkpointing is NOT implemented by modeling_phi.py, finetuning it raises OOM error even when using ZERO-3.
Thanks in advance!