Closed ternaus closed 3 years ago
@alexwitt2399 ,
Hi, how is your experience training with fp16? I made some changes to the code following your suggestions and I was able to train for quite a long time without NaN. However, NaN eventually still pop out due to multiple recurrence and partial conv. Setting epsilon to a higher number like 1e-4 still didn't help. Also, the output images is very "fish scale-ly" or having "studded patterns" when training with fp16. Changing to fp32 eventually solves most of those problems.
Trying to train the model using the fp16 setting,
but the model outputs
-inf
's during the forward pass.