qqwweee / keras-yolo3

A Keras implementation of YOLOv3 (Tensorflow backend)
MIT License
7.14k stars 3.44k forks source link

High training and validation loss, why? #457

Open dmadhitama opened 5 years ago

dmadhitama commented 5 years ago

I'm trying to train my own dataset with my own anchor size (self made) both using Yolo v3 and Tiny Yolo (3 classes). Unfortunately, both loss are high. For Yolo v3, the loss is shown below:

Epoch 00069: val_loss did not improve from 12.04695
Epoch 70/200
596/596 [==============================] - 380s 637ms/step - loss: 12.2843 - val_loss: 12.3525

Epoch 00070: val_loss did not improve from 12.04695
Epoch 71/200
596/596 [==============================] - 380s 638ms/step - loss: 12.2777 - val_loss: 12.3595

Epoch 00071: val_loss did not improve from 12.04695
Epoch 72/200
596/596 [==============================] - 383s 643ms/step - loss: 12.3075 - val_loss: 12.4914

Epoch 00072: val_loss did not improve from 12.04695
Epoch 73/200
596/596 [==============================] - 383s 642ms/step - loss: 12.2098 - val_loss: 12.1770

Epoch 00073: val_loss did not improve from 12.04695
Epoch 74/200
596/596 [==============================] - 385s 646ms/step - loss: 12.1704 - val_loss: 12.0114

Epoch 00074: val_loss improved from 12.04695 to 12.01140, saving model to logs/001/yolov3_best_weights.h5
Epoch 75/200
596/596 [==============================] - 382s 641ms/step - loss: 12.1503 - val_loss: 12.5979

Epoch 00075: val_loss did not improve from 12.01140
Epoch 76/200
596/596 [==============================] - 385s 645ms/step - loss: 12.3297 - val_loss: 12.7824

Epoch 00076: val_loss did not improve from 12.01140
Epoch 77/200
119/596 [====>.........................] - ETA: 4:57 - loss: 12.2284

For Tiny Yolo loss is around 6-7. That was quite high for both models, but when I tried to test the model with new data, the Yolo model had quite good result. It can detect all of those classes with confidence score around 0.6-0.9. For Tiny Yolo it could detect with very low confidence score around 0.1-0.4. Both model had quite good when localizing the detected objects but Yolo resulting better bounding box size than Tiny Yolo.

My question is what happened with those models and the training process itself? Why did they still can localized the object while the loss itself are really high (not around 0-1)? And then, why those models are not decreasing again and stuck at -+12 (for Yolo v3) and 6-7 (for Tiny Yolo) though I had trained them for days?

Note: I trained those models on laptop with GTX 1050 Ti.

coolbreeze2 commented 5 years ago

**hello, may I ask, How many boxes are there in one image about your dataset?

I used this model to train my dataset, but when my image have more than four boxes, it will error. like this:* `File "C:\py-learning\keras-yolo3\train.py", line 202, in _main() File "C:\py-learning\keras-yolo3\train.py", line 65, in _main callbacks=[logging, checkpoint]) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(args, *kwargs) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\engine\training.py", line 2192, in fit_generator generator_output = next(output_generator) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\utils\data_utils.py", line 793, in get six.reraise(value.class, value, value.traceback) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\six.py", line 693, in reraise raise value File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\utils\data_utils.py", line 658, in _data_generator_task generator_output = next(self._generator) File "C:\py-learning\keras-yolo3\train.py", line 184, in data_generator image, box = get_random_data(annotation_lines[i], input_shape, random=True) File "C:\py-learning\keras-yolo3\yolo3\utils.py", line 120, in get_random_data box[:, [0, 2]] = box[:, [0, 2]] nw / iw + dx IndexError: index 2 is out of bounds for axis 1 with size 2`

Many Thanks in advance.

dmadhitama commented 5 years ago

**hello, may I ask, How many boxes are there in one image about your dataset?

I used this model to train my dataset, but when my image have more than four boxes, it will error. like this:* `File "C:\py-learning\keras-yolo3\train.py", line 202, in _main() File "C:\py-learning\keras-yolo3\train.py", line 65, in _main callbacks=[logging, checkpoint]) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper return func(args, *kwargs) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\engine\training.py", line 2192, in fit_generator generator_output = next(output_generator) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\utils\data_utils.py", line 793, in get six.reraise(value.class, value, value.traceback) File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\six.py", line 693, in reraise raise value File "C:\Users\frank\Anaconda3\envs\tf\lib\site-packages\keras\utils\data_utils.py", line 658, in _data_generator_task generator_output = next(self._generator) File "C:\py-learning\keras-yolo3\train.py", line 184, in data_generator image, box = get_random_data(annotation_lines[i], input_shape, random=True) File "C:\py-learning\keras-yolo3\yolo3\utils.py", line 120, in get_random_data box[:, [0, 2]] = box[:, [0, 2]] nw / iw + dx IndexError: index 2 is out of bounds for axis 1 with size 2`

Many Thanks in advance.

I have maximum 7 boxes per image in my dataset and it runs fine.

But I'm not quite understand about your error itself, I'm curious do you follow the format of annotations in .txt file as the readme told? Like this:

path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
path/to/img2.jpg 120,300,250,600,2
...
sxyxf66 commented 5 years ago

@dmadhitama Same problem. So high loss, why? Is it right? I use the Voc2007 dataset. And what is about the final loss?

Do you know how to use our own data?

sxyxf66 commented 5 years ago

@dmadhitama @coolbreeze2 @qqwweee Do you know how to train the data on my own dataset? And when I train the Voc2007 dataset. And the val_loss is bigger(4889.5837). So strange and it's hard to get smaller. Do you know why? Thank you.

eain3314 commented 5 years ago

损失很高怎么办

eain3314 commented 5 years ago

解决了吗

eain3314 commented 5 years ago

我的在40左右loss

dodogoffy commented 5 years ago

我的在40左右loss

Me too

aaronll94 commented 4 years ago

Me too. I am training on 35 classes (110k images) and the losses (training and validation) are stuck around 35 after 10+ epochs.

robisen1 commented 4 years ago

Me too. I am training on 35 classes (110k images) and the losses (training and validation) are stuck around 35 after 10+ epochs.

10 epochs is very few. Try Something like 100. Also, have you looked at your data? Does it well represent what you want to look at? What i mean by that is do the images in your dataset have examples from different angles, distances, occlusion, lighting, and the like>?