Loss Function and learning rate

It would be useful for users to be able to set the learning rate to a learning rate schedule as we find that the loss is not always smooth.

For example, when learning the following gaussian distribution in blue

for 500 epochs and sampling the autoregressive flow, shown in orange, the loss function is not smooth

We find that using a learning rate scheduler can help alleviate this. Since the particular choice of learning rate or learning rate schedule is problem specific, I would suggest changing the following requirement on the learning_rate kwarg in maf.py from

if type(self.learning_rate) not in [int, float]:
    raise TypeError("'learning_rate', must be a float.")

if not isinstance(lr_schedule, 
                  (int, float, 
                   keras.optimizers.schedules.LearningRateSchedule)):
    raise TypeError("'learning_rate', " + 
                    "must be an integer, float or keras scheduler.")

Training on the same distribution with the following learning rate scheduler

lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-3,
    decay_steps=25,
    decay_rate=0.9)

gives the following results

with a smoother loss history

htjb / margarine

Loss Function and learning rate #14