longcw / yolo2-pytorch

YOLOv2 in PyTorch
1.55k stars 420 forks source link

IOU Loss #66

Open Erotemic opened 6 years ago

Erotemic commented 6 years ago

I'm attempting to train the yolo-v2 model and I'm getting MAP issues.

I think it might have to do with the computation of the iou loss. When I visualize the loss output on a test set as a function of the epoch, I see bbox loss and cls loss go down, but iou loss goes down towards the beginning but then spikes up around epoch 10. Note the plots are smoothed, so the actual behavior is a bit more erratic.

This only happens on a the test set. The training iou loss goes down as normal.

Test Losses: image

Train Losses image

Test / Train MAP image

I did modify your code to get it to work with my training harness which supports multiple GPUs. So, its possible that I introduced a bug.

However, using a minimally modified version of your code I get these losses on the training set, which show a similar behavior.

At epoch 110: loss bbox=0.213, iou=1.771, cls=0.305 = 2.289

At epoch 100: loss bbox=0.196, iou=1.286, cls=0.261 = 1.743

At epoch 90: loss bbox=0.205, iou=1.479, cls=0.265 = 1.949

At epoch 80: loss bbox=0.208, iou=1.780, cls=0.295 = 2.282

At epoch 70: loss bbox=0.217, iou=2.142, cls=0.277 = 2.636

At epoch 60: loss bbox=0.204, iou=1.291, cls=0.300 = 1.795

At epoch 50: loss bbox=0.226, iou=2.185, cls=0.351 = 2.763

At epoch 40: loss bbox=0.209, iou=1.730, cls=0.322 = 2.260

At epoch 30: loss bbox=0.212, iou=1.157, cls=0.378 = 1.747

At epoch 20: loss bbox=0.252, iou=2.523, cls=0.477 = 3.252

At epoch 10: loss bbox=0.250, iou=1.701, cls=0.576 = 2.528

So, I think something may be wrong in this repo.

pipick commented 4 years ago

I have a trouble recording test loss. If you still have this code which recoding test loss, could you teach me?

Thank you.