neulab / xnmt

eXtensible Neural Machine Translation
Other
185 stars 44 forks source link

Fix Z_Normalization #551

Closed philip30 closed 6 years ago

philip30 commented 6 years ago

This is a very minor change to correct the policy gradient when calculating z_normalization. I think Rewards should be normalized not only per sequence but also per item in the minibatch. So, the number of items in a minibatch will really impact the learning behaviour of the policy gradient.