zhuyiche / llava-phi

361 stars 38 forks source link

[Optimization strategy on Phi-2] #2

Open g-h-chen opened 8 months ago

g-h-chen commented 8 months ago

Hi there, Thanks for your great work! I am wondering what optimization strategy is used for finetuning the model. Since the gradient checkpointing is NOT implemented by modeling_phi.py, finetuning it raises OOM error even when using ZERO-3.

Thanks in advance!

zhuyiche commented 8 months ago

Thanks for your interest in our project. Which stages are you referring? sft on phi-2 or fine-tuning stage for llava-phi?