I'm attempting to train the yolo-v2 model and I'm getting MAP issues.
I think it might have to do with the computation of the iou loss. When I visualize the loss output on a test set as a function of the epoch, I see bbox loss and cls loss go down, but iou loss goes down towards the beginning but then spikes up around epoch 10. Note the plots are smoothed, so the actual behavior is a bit more erratic.
This only happens on a the test set. The training iou loss goes down as normal.
Test Losses:
Train Losses
Test / Train MAP
I did modify your code to get it to work with my training harness which supports multiple GPUs. So, its possible that I introduced a bug.
However, using a minimally modified version of your code I get these losses on the training set, which show a similar behavior.
At epoch 110:
loss bbox=0.213, iou=1.771, cls=0.305 = 2.289
At epoch 100:
loss bbox=0.196, iou=1.286, cls=0.261 = 1.743
At epoch 90:
loss bbox=0.205, iou=1.479, cls=0.265 = 1.949
At epoch 80:
loss bbox=0.208, iou=1.780, cls=0.295 = 2.282
At epoch 70:
loss bbox=0.217, iou=2.142, cls=0.277 = 2.636
At epoch 60:
loss bbox=0.204, iou=1.291, cls=0.300 = 1.795
At epoch 50:
loss bbox=0.226, iou=2.185, cls=0.351 = 2.763
At epoch 40:
loss bbox=0.209, iou=1.730, cls=0.322 = 2.260
At epoch 30:
loss bbox=0.212, iou=1.157, cls=0.378 = 1.747
At epoch 20:
loss bbox=0.252, iou=2.523, cls=0.477 = 3.252
At epoch 10:
loss bbox=0.250, iou=1.701, cls=0.576 = 2.528
I'm attempting to train the yolo-v2 model and I'm getting MAP issues.
I think it might have to do with the computation of the iou loss. When I visualize the loss output on a test set as a function of the epoch, I see bbox loss and cls loss go down, but iou loss goes down towards the beginning but then spikes up around epoch 10. Note the plots are smoothed, so the actual behavior is a bit more erratic.
This only happens on a the test set. The training iou loss goes down as normal.
Test Losses:
Train Losses
Test / Train MAP
I did modify your code to get it to work with my training harness which supports multiple GPUs. So, its possible that I introduced a bug.
However, using a minimally modified version of your code I get these losses on the training set, which show a similar behavior.
At epoch 110: loss bbox=0.213, iou=1.771, cls=0.305 = 2.289
At epoch 100: loss bbox=0.196, iou=1.286, cls=0.261 = 1.743
At epoch 90: loss bbox=0.205, iou=1.479, cls=0.265 = 1.949
At epoch 80: loss bbox=0.208, iou=1.780, cls=0.295 = 2.282
At epoch 70: loss bbox=0.217, iou=2.142, cls=0.277 = 2.636
At epoch 60: loss bbox=0.204, iou=1.291, cls=0.300 = 1.795
At epoch 50: loss bbox=0.226, iou=2.185, cls=0.351 = 2.763
At epoch 40: loss bbox=0.209, iou=1.730, cls=0.322 = 2.260
At epoch 30: loss bbox=0.212, iou=1.157, cls=0.378 = 1.747
At epoch 20: loss bbox=0.252, iou=2.523, cls=0.477 = 3.252
At epoch 10: loss bbox=0.250, iou=1.701, cls=0.576 = 2.528
So, I think something may be wrong in this repo.