luogen1996 / LLaVA-HR

LLaVA-HR: High-Resolution Large Language-Vision Assistant
Apache License 2.0
202 stars 9 forks source link

Got loss 0 using ZeRO2 in sft stage #16

Closed Eddy-W closed 3 months ago

Eddy-W commented 3 months ago

Hi, thanks for your great works. When I used ZeRO3 configuration, the training results was ok. However, when switching to the ZeRO2 configuration, the loss equals zero. The only difference in the training arguments is the ZeRO stage configuration, and I trained the models on 8x A800 GPUs. Are there any potential reasons for this discrepancy? image