lmnt-com / haste

Haste: a fast, simple, and open RNN library
Apache License 2.0
325 stars 27 forks source link

Zoneout remains during eval() #35

Closed DaStapo closed 3 years ago

DaStapo commented 3 years ago

I noticed that the zoneout is still applied even after I call model.eval() and I'm assuming that this is not the desired behavior. I'm therefore manually changing the zoneout value to 0 during evaluation. I only tried it for IndRNN in pytorch.

sharvil commented 3 years ago

This is the desired behavior. Zoneout needs to be applied in eval mode to ensure activations follow the same distribution as during training. The behavior of zoneout is different in eval mode: each hidden activation is computed as p * prev + (1 - p) * cur where p is the zoneout probability, prev is the hidden activation for the last time step, and cur is the activation for the current step.

DaStapo commented 3 years ago

Oh, I suppose that is what they mean by "As in dropout, we use the expectation of the random noise at test time." in the Zoneout paper. Thanks for clarifying. I was certain it was not the desired behavior because adding regularization during production seemed pointless (and my test set had better accuracy once I manually set the zoneout to 0), but I guess it's about maintaining the integrity of the output values.