Open arvoelke opened 4 years ago
one specific example is the gate initializations seem swapped, in the sense that you want to start the initialization with a bias of one and a weight of zero (otherwise it might never get input to learn off of). for the input gate.
This works:
forget_input_kernel_initializer=Constant(0),
forget_hidden_kernel_initializer=Constant(0),
forget_bias_initializer=Constant(1),
We have better versions of these, and they are not currently documented or used by any of the examples.