htjb / margarine

Code to replicate posterior probability distributions with bijectors/KDEs and perform marginal KL/bayesian dimensionality calculations.
MIT License
13 stars 8 forks source link

Loss Function and learning rate #14

Closed htjb closed 1 year ago

htjb commented 1 year ago

It would be useful for users to be able to set the learning rate to a learning rate schedule as we find that the loss is not always smooth.

For example, when learning the following gaussian distribution in blue

for 500 epochs and sampling the autoregressive flow, shown in orange, the loss function is not smooth

We find that using a learning rate scheduler can help alleviate this. Since the particular choice of learning rate or learning rate schedule is problem specific, I would suggest changing the following requirement on the learning_rate kwarg in maf.py from

if type(self.learning_rate) not in [int, float]:
    raise TypeError("'learning_rate', must be a float.")

to

if not isinstance(lr_schedule, 
                  (int, float, 
                   keras.optimizers.schedules.LearningRateSchedule)):
    raise TypeError("'learning_rate', " + 
                    "must be an integer, float or keras scheduler.")

Training on the same distribution with the following learning rate scheduler

lr_schedule = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-3,
    decay_steps=25,
    decay_rate=0.9)

gives the following results

with a smoother loss history

htjb commented 1 year ago

Closed by #15