Open arvieFrydenlund opened 5 years ago
This seldom happens. With the given hyper-parameters, this actually should not happen. However, when div_val > 1
, meaning reducing the word embedding dimensionality by div_val
times for infrequent words, this could happen with low probability according to my experience. If this happens to you, try using div_val = 1
or using smaller initial weights by decreasing init_range
or init_std
. Hope this helps.
Hi, I'm getting NAN values in the first forward pass of the model (in the first layer), generally caused by the first AC calculation. I'm wondering if this is an issue with the initial weights of the model? If so, any advice to help with this issue? I have made some changes to the model and this will help me determine if this is a known issue or if I have introduced a bug. Thanks.