Closed DaStapo closed 3 years ago
This is the desired behavior. Zoneout needs to be applied in eval mode to ensure activations follow the same distribution as during training. The behavior of zoneout is different in eval mode: each hidden activation is computed as p * prev + (1 - p) * cur
where p
is the zoneout probability, prev
is the hidden activation for the last time step, and cur
is the activation for the current step.
Oh, I suppose that is what they mean by "As in dropout, we use the expectation of the random noise at test time." in the Zoneout paper. Thanks for clarifying. I was certain it was not the desired behavior because adding regularization during production seemed pointless (and my test set had better accuracy once I manually set the zoneout to 0), but I guess it's about maintaining the integrity of the output values.
I noticed that the zoneout is still applied even after I call model.eval() and I'm assuming that this is not the desired behavior. I'm therefore manually changing the zoneout value to 0 during evaluation. I only tried it for IndRNN in pytorch.