Open sohinimallick opened 3 years ago
Seems like whatever lr modifications you're doing are at fault. I would remove them and see if the problem persists, that would indicate a data problem if it remains.
I have the same issue after updating to tensorflow 2.4 Each next training process performing an increase in train and val loss without any changes. Do you have a solution?
I am trying to train the model with a training schedule where the lr is reduced /10 every few epochs. For example, heads trained for 10 epochs at lr, 4+ for next 10 epochs at lr/10, all layers for another 10 epochs at lr/100 (total 30 epochs). However the last increases after every training stage rather than going down i.e. if it finishes the first 10 at a loss value = 1.6, the next stage starts at a loss of 3.4. Does anyone know the reason behind this.
config.LEARNING_RATE=0.001 Training schedule:
model.train(train_set, val_set, learning_rate=config.LEARNING_RATE/10, epochs=10, augmentation=augmentation, layers='heads') model.train(train_set, val_set, learning_rate=config.LEARNING_RATE/100, epochs=20, augmentation=augmentation, layers='4+') model.train(train_set, val_set, learning_rate=config.LEARNING_RATE/1000, epochs=30, augmentation=augmentation, layers='all')
Configurations:
Configuration
class LIDARConfig(Config): """Configuration for training on LIDAR dataset. Derives from the base Config class and overrides values specific to the LIDAR dataset. """
Give the configuration a recognizable name