Closed lyx190 closed 5 years ago
@lyx190 ah this is very interesting, I had not heard about GIoU before. I read the paper, and it seems like you may need to do some regularization of the new GIoU loss to balance it's magnitude with the existing losses (conf and class).
Independently of this, I'd advise you to test the default repo first so you have a proper way to test the difference, otherwise your metrics mean nothing by themselves.
@glenn-jocher thank you for your reply. I have trained on my own dataset with the default repo before. The result looks good, with 88% recall and 78% precision.
What's more, I am trying to finetune the values of 'cls', 'conf', 'iou_t' on the Hyperparameters.
As for my problem with the high recall(90) and low precision(20), my idea is, that it perhaps caused by the maldistribution of True and False samples during training phase. How do you think of it?
Edited: According to my thought above, I have changed the 'iou_t' from 0.24(before) to 0.54 now, it is training. But the result seems not good enough, to epoch 32, recall: 30%, precision: 11.3%.
Besides, because of the range of the GIoU loss value is among 0 ~ 2, so I have also normalized it into 0 to 1 to fit the cls_loss and conf_loss value.
@lyx190 ah ok. Hmm, those are very high numbers indeed you got before, and now the precision has dropped a lot. hyp['iou_t']
in particular affects the precision-recall balance. If you increase this hyperparamter your precision will increase (at the expense of recall). The new GIoU loss needs a hyperparameter to balance it against conf.
class loss in single-class datasets is always zero (if you've configured your *.cfg for 1 class), so in your case hyp['cls']
should not have any effect. Maybe I can try to implement a GIoU loss as well, it seems simple enough if I understand the paper correctly.
@glenn-jocher ah I forgot to tell before that hyp['giou'] and hyp['conf'] I set was 2.4 and 4.3. But I am confused that would these two parameters affect the recall and precision so hard?
And also in the training phase, if I wanna map the prediction values to the size of feature map, is it correct that: for xy:
grid_xy = torch.cat((gi.view(-1, 1), gj.view(-1, 1)), 1).float()
pi_xy = torch.sigmoid(pi[..., :2]) + grid_xy
for wh:
pi_wh = torch.exp(pi[..., 2:4]) * (the size of corresponding anchors)
Oh, you should probably make hyp['giou']
much smaller, maybe in the 0.1-0.5 range, since the current box regression hyperparameters are 0.2 and 0.1 for xy and wh.
hyp = {'xy': 0.2, # xy loss gain
'wh': 0.1, # wh loss gain
'cls': 0.04, # cls loss gain
'conf': 4.5, # conf loss gain
'iou_t': 0.5, # iou target-anchor training threshold
'lr0': 0.001, # initial learning rate
'lrf': -4., # final learning rate = lr0 * (10 ** lrf)
'momentum': 0.90, # SGD momentum
'weight_decay': 0.0005} # optimizer weight decay
I think the IOU metric should not be affected by the units used, so the IOUs could be in grid units or pixels. Off the top of my head the equations seem right, but you should be careful because i and j may be swapped for x and y, i.e. its possible the xy gridpoints are ji and not ij.
@glenn-jocher thanks for your remind. I would check again the xy gridpoints. now I am gonna use the whole current hyperparameters to test again the giou_loss. If I have any result i will post here.
@lyx190 I started looking into a GIoU implementation. The code to export the bounding boxes is already present in models.py (for inference). This will create the boxes in units of pixels. If you want the output in grid units you simply skip the line 164 stride
multiplication.
https://github.com/ultralytics/yolov3/blob/9cf5ab0c9d41231148e8f6df23a4797ffa8e6d1a/models.py#L157-L167
Have you had any luck with your implementation?
@lyx190 BTW you should be extremely careful about building your own grid_xy
, it can be very complicated, particularly in the case of rectangular inference. Our validated code which does this in models.py
is:
https://github.com/ultralytics/yolov3/blob/9cf5ab0c9d41231148e8f6df23a4797ffa8e6d1a/models.py#L247-L250
Note that in grid_xy
the vertical dimension (y
) comes first, before the horizontal (x
) dimension. This is in maintaining with typical pytorch image dimension standards.
@lyx190 ok, this doesn't seem too hard. I think the fastest way to try is to output models.py raw values along with inference values. The target boxes are in xywh (normalized), while the inference outputs are in xywh (pixels). So we simply multiply the target boxes by the img_size, (target-anchor matches are already made so no new work needed there), and then simply pass the matched boxes to the existing bbox_iou()
function to get IOUs (it can handle xywh format). Then determine convex area and implement GIoU loss, maybe using about 0.3 as the start of a hyperparameter search for hyp['giou']
.
@lyx190 I implemented giou loss in a new giou branch: https://github.com/ultralytics/yolov3/tree/giou
I tuned the weighting hyperparameters manually a bit to hyp['giou'] = 0.02
. I compared two trainings on the coco_img64.data
file that comes with the repo. I was not able to exceed the default performance on this training set, but the loss seems to work well, and the branch seems to be a good starting point for experimentation. Note that in the branch, the xy
loss reported is the giou
loss.
@glenn-jocher ah it's nice. I am gonna check the new branch now. Yesterday I have print the values of
MSE(torch.sigmoid(pi[..., 0:2]), txy[i])
and GIoU(pi_xyxy, gt_xyxy)
.
The value of xy(MSE) is among 0.03--0.08, while GIoU() is among 0.3--0.9. So I set the hyp['giou_loss'] from 0.2 to 0.04 and trained the net, the result is so bad that recall and precision was only 30%.
@glenn-jocher BTW, I am gonna take a look the code you implemented. Your result looks much better than mine. I think maybe there is something wrong with my implementation. My modified with GIoU is below:
def compute_loss(p, targets, model, device): # predictions, targets, model
ft = torch.cuda.FloatTensor if p[0].is_cuda else torch.Tensor
lcls, lconf, lgiou = ft([0]), ft([0]), ft([0])
tcls, indices, xyOriginal, whOriginal, grids, anchorsFeatureMap = build_targets(model, targets)
# Define criteria
# MSE = nn.MSELoss()
CE = nn.CrossEntropyLoss() # (weight=model.class_weights)
BCE = nn.BCEWithLogitsLoss()
GIoU = GIoU_loss()
# Compute losses
h = model.hyp # hyperparameters
bs = p[0].shape[0] # batch size
k = bs # loss gain
for i, pi0 in enumerate(p): # layer i predictions, i
b, a, gj, gi = indices[i] # image, anchor, gridy, gridx
tconf = torch.zeros_like(pi0[..., 0]) # conf
# Compute losses
if len(b): # number of targets
pi = pi0[b, a, gj, gi] # predictions closest to anchors
tconf[b, a, gj, gi] = 1 # conf
# pi[..., 2:4] = torch.sigmoid(pi[..., 2:4]) # wh power loss (uncomment)
grid_xy = torch.cat((gi.view(-1, 1), gj.view(-1, 1)), 1).float().to(device)
pi_xy = (torch.sigmoid(pi[..., :2]) + grid_xy) / grids[i][0]
pi_wh = torch.exp(pi[..., 2:4]) * anchorsFeatureMap[i][a] / grids[i][0]
pi_xy_view = pi_xy.view(-1, 2)
pi_wh_view = pi_wh.view(-1, 2)
pi_xywh = torch.cat((pi_xy_view, pi_wh_view), 1)
pi_xyxy = xywh2xyxy(pi_xywh)
gt_xy = xyOriginal[i]
gt_wh = whOriginal[i]
gt_xywh = torch.cat((gt_xy, gt_wh), 1)
gt_xyxy = xywh2xyxy(gt_xywh)
# liou = (k * h['iou_loss']) * GIoU(pi_xyxy, gt_xyxy)[0]
lgiou = (k * h['giou_loss']) * (GIoU(pi_xyxy, gt_xyxy)[1] / 2)
# lxy += (k * h['xy']) * MSE(torch.sigmoid(pi[..., 0:2]), txy[i]) # xy loss
# lwh += (k * h['wh']) * MSE(pi[..., 2:4], twh[i]) # wh yolo loss
lcls += (k * h['cls']) * CE(pi[..., 5:], tcls[i]) # class_conf loss
# pos_weight = ft([gp[i] / min(gp) * 4.])
# BCE = nn.BCEWithLogitsLoss(pos_weight=pos_weight)
lconf += (k * h['conf']) * BCE(pi0[..., 4], tconf) # obj_conf loss
loss = lgiou + lconf + lcls
return loss, torch.cat((lgiou, lconf, lcls, loss)).detach()
def build_targets(model, targets):
# targets = [image, class, x, y, w, h]
iou_thres = model.hyp['iou_t'] # hyperparameter
if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel):
model = model.module
nt = len(targets) # number of bouding boxes
tcls, indices, original_xy, original_wh, grid_layers, anchor_vec = [], [], [], [], [], []
for i in model.yolo_layers:
layer = model.module_list[i][0]
# iou of targets-anchors
t, a = targets, []
gwh = targets[:, 4:6] * layer.ng
if nt:
iou = [wh_iou(x, gwh) for x in layer.anchor_vec] # anchor_vec: 映射到feature map的anchor boxes的大小
iou, a = torch.stack(iou, 0).max(0) # best iou and anchor
# reject below threshold ious (OPTIONAL, increases P, lowers R)
reject = True
if reject:
j = iou > iou_thres
t, a, gwh= targets[j], a[j], gwh[j]
# Indices
b, c = t[:, :2].long().t() # target image, class
gxy = t[:, 2:4] * layer.ng # grid x, y
gi, gj = gxy.long().t() # grid x, y indices # position of middle point of gound truth in feature map.
indices.append((b, a, gj, gi))
gxy_original = t[:, 2:4]
gwh_original = t[:, 4:6]
anchor_vec.append(layer.anchor_vec)
grid_layers.append(layer.ng)
original_xy.append(gxy_original)
original_wh.append(gwh_original)
# XY coordinates
# txy.append(gxy - gxy.floor()) # In the feature map, the offset of ground truth respect to the grid it belongs.
# Width and height
# twh.append(torch.log(gwh / layer.anchor_vec[a])) # wh yolo method
# twh.append((gwh / layer.anchor_vec[a]) ** (1 / 3) / 2)
# Class
tcls.append(c)
if c.shape[0]:
assert c.max() <= layer.nc, 'Target classes exceed model classes'
return tcls, indices, original_xy, original_wh, grid_layers, anchor_vec
@glenn-jocher Now I am using your code to train on my own dataset, but I have a question about the calculation of GIoU: https://github.com/ultralytics/yolov3/blob/243344af568e990fc78b1095e016585baaa6d484/utils/utils.py#L297-L298
These two lines here, the xy coordinates of ground truth are resized based on the size of feature maps, so it isn't necessary to map the torch.sigmoid([pi[..., :2]]) to the size of feature map? If so, I think we should also modify the values of xy coordinates (without adding to the self.grid_xy) during inference, right?
@lyx190 inference operates completely well, as you can see from mAP comparisons to published results in the README.
The predicted and target boxes will always have a common origin, adding the grid location is redundant since it will be removed in the IoU calculation. i.e. there's no point in adding the same constant to both xy values.
@glenn-jocher Yes you are right. But I am confused that why if I added the grid_xy to predicted xy coordinate, the result would go down.
Besides, I have trained on my own dataset with the new giou branch, below are the results between with giou and without giou:
Both results are trained on my own dataset. The blue line with default hyp[] setting. The orange line with hyp[] which you set on the 'giou' branch.
@lyx190 grid_xy is a constant offset. You can add it to both boxes you pass to giou, or add it to none, but you can not add it to only one. Your results look similar to mine above, the giou results are worse unfortunately. In my example the difference is minimal though, only a few percent lower.
Your second run looks a little more promising though (the area between 220 to 300 I assume is a new run).
@glenn-jocher I don't know what happened between epoch 220 to 300, all things went down automatically from 220. So I shut it down at 300 epoch. Do you think the way to improve the result is to adjust the weight of giou_loss?
I don't know. It's a very nice idea, its too bad we can't reproduce the paper results yet. I suppose the conclusions one might draw from these results are that:
@glenn-jocher thank you for your idea, I would keep working on it. If I got a better result would come back here again.
@its a very appealing idea to wrap all of the regression losses into one. Hopefully we can get it to work better. I'll leave the branch open, and if you make any discoveries let me know!
@glenn-jocher ok!
@glenn-jocher Hello, I am still struggling for the implementation now. I have found two things:
bbox_iou
should be set to False.@glenn-jocher Hello, I am still struggling for the implementation now. I have found two things:
- In the code in branch giou, when calculating the value of giou, I think the keyword x1y1x2y2 of function
bbox_iou
should be set to False.
Can you provide an exact line number of which True
should be set to False
?
- I found that the value of predicted weight and height would sometimes turned to very large or very small, like 2e+34 or -2e+34, which causes NaN. I don't know why, do you have any idea?
I haven't observed this when training COCO or any of the smaller COCO datasets like coco_64img.data
. What dataset are you using? Can you provide code for a minimum reproducible example?
@glenn-jocher
@lyx190 hey I think you are right!! That line should read like this instead, because both the targets and prediction boxes are in xywh
format, not xyxy
format. This is a huge correction, it should hopefully improve the results dramatically!! Have you tried it out with the correction?
giou = bbox_iou(pbox.t(), tbox[i], GIoU=True, x1y1x2y2=False)
Your #2 is a pretty well known issue with the YOLO layer, which is that wh
loss can diverge early on during training, becoming infinity or nan (search the issues for wh
divergence). This is the reason for the burnin
period in train.py when using the SGD optimizer. I had hoped that GIOU loss would fix this issue though. Are you saying that your wh
predictions are still divering to nan even with GIOU loss? Perhaps the error above is the cause. Can you test again with the correction? I'll submit a commit right now, then you just need to git pull
.
UPDATE1: I re-ran coco_64img.data to compare to my previous results. The updated results incorporating the x1y1x2y2=False
fix are below as results_giou_v2. There is only a slight improvement unfortunately :(
@glenn-jocher I have tried the new on my own dataset, it also, unfortunately, had a little improvement. I am trying now with the loss format: lgiou = -log((1 + giou) / 2). If I get better result, I would tell you again.
And about my second problem, the very high value of predicted wh
was only one bounding box in the array of prediction, that is, other predicted boxes looked very normal around 1.000, but only one was very large. And because we apply mean()
on the giou loss, so the whole loss turned into nan
.
@lyx190 we trained yolov3-spp fully with giou for 68 epochs and ended up at 0.464 mAP, about the same as regular xy and wh implementation. Any luck on your side?
@glenn-jocher hello, I have trained on my own dataset, got the same result of regular yolov3. On which dataset did you trained with giou? I am going to test it on the same dataset as yours.
@lyx190 I trained coco for one epoch. Also got same results as default. I've integrated giou into the main branch now, you can train with it using --giou
like this:
python3 train.py --data data/coco.data --img-size 320 --epochs 1 --giou
@glenn-jocher Okay, thank you. I would train it on coco 64 image dataset and then post the result here
@lyx190 GIoU is now integrated as the default regression loss for this repository, so simply running the default training command will use GIoU. Closing this issue as resolved.
@glenn-jocher So, your conclusion is that GIoU makes the model be trained well?
@developer0hye yes GIoU is a definite improvement over the individual xywh MSE losses. It stabilizes the wh loss, and since it merges 4 losses into 1 it is much easier to work with and tune hyperparameters with.
@glenn-jocher Thank you for the answer. When I applied that to my custom dataset and yolo implementation, Its performance is worse than the xywh loss. Okay, I will retry!
Ah, one more! have you tried optimize the model with DIoU or CIoU?
@developer0hye hmm, interesting. I suppose results will vary across datasets. The ultralytics results in https://github.com/ultralytics/yolov3#map were all trained from scratch using this repo with GIoU.
@glenn-jocher Thank you very much for your code. I am attempting to using GIoU loss with yolov3 on my own dataset with single class. I am confused about some relative problems. So I wanna ask you guys, if anyone achieved to replace lxy, lwy with GIoU loss to train the net?
I have mapped x, y, w, h of prediction and targets to the size of feature map and then calculated the GIoU loss value in the training phase. But after training, the recall can reach a high value like 90%, precision stay always very low at 25%. I don't know which part causes this problem.
Have anyone idea or managed to modified the GIoU loss on yolov3?