Add trainable theta and euler as discretizer

What does this PR add?

It enables the user to optionally train theta
It allows the user to choose between 2 methods of discretization: 1. Zero Order Hold and 2. Euler, as compared to default discretization with Zero Order Hold Previosly.
Updated and new tests to test added features.

How is a trainable `theta` implemented?

First change that has been made is that we now always work with theta_inv = 1/theta since that can result in better gradients, if training theta. If training of theta is disabled, we still work with theta_inv but it does not get updated.
Note that user still specifies an init theta which is then internally inverted to theta_inv
if training of theta is enabled, then theta_inv is added as a weight of the layer. If not, then it is added as an attribute of the layer. This distinction is made so that this implementation stays compatible with models that were built with previous versions of keras-lmu (without trainable theta).

How does training with Euler work?

Since, theta can be decoupled from the A and B matrices when using euler, A and B (weights of the layer) are set to CONST_A and CONST_B and never updated if training theta.
If not training theta, A and B are set to CONST_Atheta_inv and CONST_Btheta_inv respectively (they are still not updated naturally).
However, the call function implementes the memory update as m = m + theta_inv*(A*m + B*u), thus capturing the gradient of theta_inv and ensuring that gradients of theta_inv are well composed.

How does training with Zero Order Hold (zoh) work?

Note that theta cannot be decoupled from A and B matrices when using zoh. Thus, when training, new A and B matrices are generated during the call function itself. This will be slower than discretizing with euler.
Note that a custom _cont2discrete function for discretizing with zoh has been implemented instead of using the previosuly default implementation from scipy.signal. This is because scipy.signal.cont2discrete only accepts numpy inputs and not tf.tensors, which will break the flow of gradients to theta_inv.

Where to start the review?

You can start from the commit of Add trainable theta and discretization options and then go to Update and add new tests. These are the only 2 main commits. There is an additional commit but that is a bones update.

Any other remarks?

According to my understanding, the examples CI run confirms that the new layer is compatible with previous models. But I think I will leave this question here since I am not 100% sure.
The get_config of each LMUCell, LMU and LMUFFT seralises theta_init as the theta parameter and not the final value. Leaving it here to confirm this makes sense.
Finally, given that support for TF2.1 has been dropped, does an update to [docs/compatibility list/pre-reqruisites] section need to be made somewhere?

nengo / keras-lmu

Add trainable theta and euler as discretizer #41

What does this PR add?

How is a trainable `theta` implemented?

How does training with Euler work?

How does training with Zero Order Hold (zoh) work?

Where to start the review?

Any other remarks?

nengo / keras-lmu

Add trainable theta and euler as discretizer #41

What does this PR add?

How is a trainable theta implemented?

How does training with Euler work?

How does training with Zero Order Hold (zoh) work?

Where to start the review?

Any other remarks?

How is a trainable `theta` implemented?