It allows the user to choose between 2 methods of discretization: 1. Zero Order Hold and 2. Euler, as compared to default discretization with Zero Order Hold Previosly.
Updated and new tests to test added features.
How is a trainable theta implemented?
First change that has been made is that we now always work with theta_inv = 1/theta since that can result in better gradients, if training theta. If training of theta is disabled, we still work with theta_inv but it does not get updated.
Note that user still specifies an init theta which is then internally inverted to theta_inv
if training of theta is enabled, then theta_inv is added as a weight of the layer. If not, then it is added as an attribute of the layer. This distinction is made so that this implementation stays compatible with models that were built with previous versions of keras-lmu (without trainable theta).
How does training with Euler work?
Since, theta can be decoupled from the A and B matrices when using euler, A and B (weights of the layer) are set to CONST_A and CONST_B and never updated if training theta.
If not training theta, A and B are set to CONST_Atheta_inv and CONST_Btheta_inv respectively (they are still not updated naturally).
However, the call function implementes the memory update as m = m + theta_inv*(A*m + B*u), thus capturing the gradient of theta_inv and ensuring that gradients of theta_inv are well composed.
How does training with Zero Order Hold (zoh) work?
Note that theta cannot be decoupled from A and B matrices when using zoh. Thus, when training, new A and B matrices are generated during the call function itself. This will be slower than discretizing with euler.
Note that a custom _cont2discrete function for discretizing with zoh has been implemented instead of using the previosuly default implementation from scipy.signal. This is because scipy.signal.cont2discrete only accepts numpy inputs and not tf.tensors, which will break the flow of gradients to theta_inv.
Where to start the review?
You can start from the commit of Add trainable theta and discretization options and then go to Update and add new tests. These are the only 2 main commits. There is an additional commit but that is a bones update.
Any other remarks?
According to my understanding, the examples CI run confirms that the new layer is compatible with previous models. But I think I will leave this question here since I am not 100% sure.
The get_config of each LMUCell, LMU and LMUFFT seralises theta_init as the theta parameter and not the final value. Leaving it here to confirm this makes sense.
Finally, given that support for TF2.1 has been dropped, does an update to [docs/compatibility list/pre-reqruisites] section need to be made somewhere?
What does this PR add?
theta
How is a trainable
theta
implemented?theta_inv
= 1/theta
since that can result in better gradients, if trainingtheta
. If training oftheta
is disabled, we still work withtheta_inv
but it does not get updated.theta
which is then internally inverted totheta_inv
theta_inv
is added as a weight of the layer. If not, then it is added as an attribute of the layer. This distinction is made so that this implementation stays compatible with models that were built with previous versions of keras-lmu (without trainabletheta
).How does training with Euler work?
theta
can be decoupled from theA
andB
matrices when using euler,A
andB
(weights of the layer) are set toCONST_A
andCONST_B
and never updated if training theta.A
andB
are set toCONST_A
theta_inv
andCONST_B
theta_inv
respectively (they are still not updated naturally).call
function implementes the memory update asm = m + theta_inv*(A*m + B*u)
, thus capturing the gradient oftheta_inv
and ensuring that gradients oftheta_inv
are well composed.How does training with Zero Order Hold (zoh) work?
theta
cannot be decoupled fromA
andB
matrices when using zoh. Thus, when training, newA
andB
matrices are generated during thecall
function itself. This will be slower than discretizing with euler._cont2discrete
function for discretizing with zoh has been implemented instead of using the previosuly default implementation fromscipy.signal
. This is becausescipy.signal.cont2discrete
only accepts numpy inputs and nottf.tensors
, which will break the flow of gradients totheta_inv
.Where to start the review?
You can start from the commit of
Add trainable theta and discretization options
and then go toUpdate and add new tests
. These are the only 2 main commits. There is an additional commit but that is a bones update.Any other remarks?
get_config
of eachLMUCell
,LMU
andLMUFFT
seralisestheta_init
as thetheta
parameter and not the final value. Leaving it here to confirm this makes sense.