Model weights not saving

altear commented 3 years ago

I'm trying to train my network on my own images, but after ~16h (after training was complete) I realized that not a single model was saved to the weights directory.

This is the code I'm running (basically just the getting-started code)

from ISR.models import RRDN
from ISR.models import Discriminator
from ISR.models import Cut_VGG19
from ISR.train import Trainer

lr_train_patch_size = 50 # This needs to be size 100
layers_to_extract = [5, 9]
scale = 4
hr_train_patch_size = lr_train_patch_size * scale

rrdn  = RRDN(arch_params={'C':4, 'D':3, 'G':64, 'G0':64, 'T':10, 'x':scale}, patch_size=lr_train_patch_size)
f_ext = Cut_VGG19(patch_size=hr_train_patch_size, layers_to_extract=layers_to_extract)
discr = Discriminator(patch_size=hr_train_patch_size, kernel_size=3)

loss_weights = {
  'generator': 0.0,
  'feature_extractor': 0.0833,
  'discriminator': 0.01
}
losses = {
  'generator': 'mae',
  'feature_extractor': 'mse',
  'discriminator': 'binary_crossentropy'
}

log_dirs = {'logs': './logs', 'weights': './weights'}

learning_rate = {'initial_value': 0.0004, 'decay_factor': 0.5, 'decay_frequency': 30}

flatness = {'min': 0.0, 'max': 0.15, 'increase': 0.01, 'increase_frequency': 5}

trainer = Trainer(
    generator=rrdn,
    discriminator=discr,
    feature_extractor=f_ext,
    lr_train_dir='/home/ubuntu/comp5900f/data/centered_scale-1-over-4/train',
    hr_train_dir='/home/ubuntu/comp5900f/data/centered/train',
    lr_valid_dir='/home/ubuntu/comp5900f/data/centered_scale-1-over-4/validate',
    hr_valid_dir='/home/ubuntu/comp5900f/data/centered/validate',
    loss_weights=loss_weights,
    learning_rate=learning_rate,
    flatness=flatness,
    dataname='image_dataset',
    log_dirs=log_dirs,
    weights_generator=None,
    weights_discriminator=None,
    n_validation=40,
)

trainer.train(
    epochs=80,
    steps_per_epoch=200,
    batch_size=16,
    monitored_metrics={'val_PSNR_Y': 'max'}
)

The only things that ever appears in the weights folder is the session_config.yaml

Manopphysics commented 3 years ago

I have the same problem, weights are not saving, only session_config.yaml are saved, but model is not saved.

dpincic commented 3 years ago

I'm not sure, but is seems that the problem could be in the monitored_metrics part. For example, you could try to rename metric "val_PSNR_Y" to "val_generator_PSNR_Y" and see if it helps or if it is what you want.

The problem appears to be that metric "val_PSNR_Y" is not in training nor validation losses when training the model, so this metric is removed from monitored_metrics and as a result no metrics are monitored and thats why models are not being saved.

altear commented 3 years ago

Thanks

Just a heads up for anyone who's maintaining the repo. The code was mostly just c+p from the Training section from the README. If it's a configuration issue, it might continue to be an issue for people using the demo code

dpincic commented 3 years ago

Just a note, using code from notebooks/ISR_Traininig_Tutorial.ipynb for training should work, since the metric there is actually 'val_generator_PSNR_Y', as it should be.

cfrancesco commented 3 years ago

thanks @altear, there is a discrepancy in the readme. I will probably just remove this option or make it default if there is no match. @dpincic is correct, thanks for helping!

dokluch commented 3 years ago

thanks @altear, there is a discrepancy in the readme. I will probably just remove this option or make it default if there is no match. @dpincic is correct, thanks for helping!

Is it possible to manually extract weights from the trainer object to save as h5?

diesazul96 commented 3 years ago

The weights are stored for each epoc?

galacticue06 commented 3 years ago

The trained net is stored in .keras\datasets\

idealo / image-super-resolution

Model weights not saving #162