Closed abacaj closed 5 months ago
The updated code of phi-2 produces a high loss, I have tried fp16, bf16, deepspeed and fsdp the result is the same -> loss starts at 2 and keeps going higher. Setting use_flash_attention_2=False fixes this or using the old phi-2 modeling file.
use_flash_attention_2=False
torch==2.1.2 flash-attn==2.4.2 transformers==4.37.0.dev0
No response
examples
Fine-tune the updated phi-2 model using transformers trainer
Loss go down
System Info
The updated code of phi-2 produces a high loss, I have tried fp16, bf16, deepspeed and fsdp the result is the same -> loss starts at 2 and keeps going higher. Setting
use_flash_attention_2=False
fixes this or using the old phi-2 modeling file.torch==2.1.2 flash-attn==2.4.2 transformers==4.37.0.dev0
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Fine-tune the updated phi-2 model using transformers trainer
Expected behavior
Loss go down