Numeric stability w/ AMP

mit-han-lab / efficientvit

EfficientViT is a new family of vision models for efficient high-resolution vision.

Apache License 2.0

1.62k stars 143 forks source link

Hello Ross,

Thank you for sharing your findings!

I also have similar findings that q/k/v Matmul and the division need to be float32 during training to avoid NaN loss. We currently do not have a good remedy for this. Given that the q/k/v Matmul and the division are lightweight, your current approach is an excellent workaround to bypass the problem. Certainly, we will delve further into this matter and will keep you updated once we identify an effective solution.

Regarding the evaluation stability, I am not sure whether changing the eps to 1e-5 will hurt the accuracy or not. If possible, I think keeping the division in float32 during testing is a better solution since its computation cost is negligible.

Thank you, Han

mit-han-lab / efficientvit

Numeric stability w/ AMP #15