qjadud1994 / CRNN-Keras

CRNN (CNN+RNN) for OCR using Keras / License Plate Recognition
MIT License
527 stars 191 forks source link

KeyError: 'val_loss' when try to train... #55

Open pendex900x opened 4 years ago

pendex900x commented 4 years ago

When I load 43 images to train and 33 to test, with the command python training.py

This is the output:

(crnn-keras) C:\Users\X\Desktop\CRNN-Keras-master\CRNN-Keras-master>python training.py
Using TensorFlow backend.
2020-06-01 00:52:20.748876: I C:\tf_jenkins\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
...New weight data...
33  Image Loading start...
True
33  Image Loading finish...
43  Image Loading start...
True
43  Image Loading finish...
Epoch 1/30
Traceback (most recent call last):
  File "training.py", line 41, in <module>
    validation_steps=int(tiger_val.n / val_batch_size))
  File "C:\Users\X\Anaconda3\envs\crnn-keras\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\X\Anaconda3\envs\crnn-keras\lib\site-packages\keras\engine\training.py", line 2213, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "C:\Users\X\Anaconda3\envs\crnn-keras\lib\site-packages\keras\callbacks.py", line 76, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "C:\Users\X\Anaconda3\envs\crnn-keras\lib\site-packages\keras\callbacks.py", line 401, in on_epoch_end
    filepath = self.filepath.format(epoch=epoch + 1, **logs)
KeyError: 'val_loss'

Why it happends?

Mohit-robo commented 2 years ago

The error was the same for me: when using val_loss in the checkpoint file name, I would get the following error: KeyError: 'val_loss'. My checkpointer was also monitoring this field, so even if I took the field out of the file name, I would still get this warning from the checkpointer: WARNING:tensorflow:Can save best model only with val_loss available, skipping.

In my case, the issue was that I was upgrading from using Keras and Tensorflow 1 separately to using the Keras that came with Tensorflow 2. The period param for ModelCheckpoint had been replaced with save_freq. I erroneously assumed that save_freq behaved the same way, so I set it to save_freq=1 thinking this would save it every epic. However, the docs state:

save_freq: 'epoch' or integer. When using 'epoch', the callback saves the model after each epoch. When using integer, the callback saves the model at end of a batch at which this many samples have been seen since last saving. Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (it could reflect as little as 1 batch, since the metrics get reset every epoch). Defaults to 'epoch'

Setting save_freq='epoch' solved the issue for me.