mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.
Apache License 2.0
1.62k stars 143 forks source link

Numeric stability w/ AMP #15

Open rwightman opened 10 months ago

rwightman commented 10 months ago

Hello, a contributor recently added EfficientViT to timm so I explored the model before merging... I found that it could not train in mixed precision without instantly having NaN loss. The problem appears to be the q/k/v matmuls and the division

Have you observed similar or thought of any approaches to improve this?

han-cai commented 10 months ago

Hello Ross,

Thank you for sharing your findings!

I also have similar findings that q/k/v Matmul and the division need to be float32 during training to avoid NaN loss. We currently do not have a good remedy for this. Given that the q/k/v Matmul and the division are lightweight, your current approach is an excellent workaround to bypass the problem. Certainly, we will delve further into this matter and will keep you updated once we identify an effective solution.

Regarding the evaluation stability, I am not sure whether changing the eps to 1e-5 will hurt the accuracy or not. If possible, I think keeping the division in float32 during testing is a better solution since its computation cost is negligible.

Thank you, Han