marcoppasini / musika

Fast Infinite Waveform Music Generation
MIT License
662 stars 49 forks source link

Inefficiency warning when restarting from previous epoch #17

Closed pbakaus closed 2 years ago

pbakaus commented 2 years ago

Heya,

this might be nothing, but when starting from a previous checkpoint/epoch, I get the following warning:

Using GPU with mixed precision enabled...

Calculating total number of samples in data folder...
Found 396 total samples
Dataset is ready!
Checking if models are already available...
Models are available!
/usr/local/lib/python3.9/site-packages/keras/initializers/initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
WARNING:tensorflow:You forgot to call LossScaleOptimizer.get_scaled_loss() and LossScaleOptimizer.get_unscaled_gradients() before calling LossScaleOptimizer.apply_gradients(). This will likely result in worse model quality, so please call them in the correct places! For example:
    with tf.GradientTape() as tape:
      loss = loss_fn()
      scaled_loss = opt.get_scaled_loss(loss)
    scaled_grads = tape.gradient(scaled_loss, vars)
    grads = opt.get_unscaled_gradients(scaled_grads)
    opt.apply_gradients([(grads, var)])
For more information, see https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision/LossScaleOptimizer
WARNING:tensorflow:You forgot to call LossScaleOptimizer.get_scaled_loss() and LossScaleOptimizer.get_unscaled_gradients() before calling LossScaleOptimizer.apply_gradients(). This will likely result in worse model quality, so please call them in the correct places! For example:
    with tf.GradientTape() as tape:
      loss = loss_fn()
      scaled_loss = opt.get_scaled_loss(loss)
    scaled_grads = tape.gradient(scaled_loss, vars)
    grads = opt.get_unscaled_gradients(scaled_grads)
    opt.apply_gradients([(grads, var)])

Thought I'd report out of caution, but please close if this is not actionable!

marcoppasini commented 2 years ago

Thank you for reporting! I am aware of the warnings, and they should originate from the gradient calculation of the gradient penalty term, which does not use loss scaling for mixed precision training.