ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
10.19k stars 3.44k forks source link

Rotated Bounding boxes training #345

Closed sawhney-medha closed 4 years ago

sawhney-medha commented 5 years ago

Hello. Thanks for this implementation. Can you please suggest how to train using x_center, y_center, width, height and theta(angle of rotation) to track oriented bounding boxes while training.

Rectangles don't help always because for rotated objects, they cover extra data which is not needed so i want to implement training on oriented boxes, so the input is x,y,w,h and theta instead of x and y. Thanks for your help in advance.

glenn-jocher commented 5 years ago

@sawhney-medha there has been much work done on this in the field of satellite imagery object detection (which also relates to the rotation invariance that CNN's generally suffer from). You can always add an angle parameter (if your training data has it), though there will be a significant nonlinearity in this 5D space where the angle stitches from say -180 to +180 deg, or 359 to 0 deg. You would simply resize the YOLO layers to the new shape, and run a sigmoid activation on the angle output so that it ranges from 0 to 1 (i.e. 0 to 360 deg), or a tanh activation so it ranges from -1 to 1 (i.e. -180 to 180).

You might want to think about other ways to express your information though, such as regressing the eigenvectors and eigenvalues of the rotated bounding boxes rather than the box and the angle. Remember you are going to need some way to calculate IoU, or a comparable metric, on your results. If you make any interesting progress let me know!

HamsterHuey commented 5 years ago

I'd recommend using a cosine(theta) and sine(theta) representation for the angle since that removes the wrap-around issue and makes it a bit easier for the network to regress. I have in the past modified the YOLO loss function to regress out the orientation of a vehicle using this approach (a bit different from what you are trying to do) and it worked fairly well. I didn't notice much difference between using a sigmoid scaled to -1 to +1 for the cosine/sine regression v.s. trying to directly regress the values.

sawhney-medha commented 5 years ago

@glenn-jocher Thank you, I will try implementing something for my data. Sure, if I get to something, will surely let you know :)

sawhney-medha commented 5 years ago

@HamsterHuey Thanks, I will keep that in mind. :)

glenn-jocher commented 5 years ago

Great, good luck!

ming71 commented 4 years ago

I'd recommend using a cosine(theta) and sine(theta) representation for the angle since that removes the wrap-around issue and makes it a bit easier for the network to regress. I have in the past modified the YOLO loss function to regress out the orientation of a vehicle using this approach (a bit different from what you are trying to do) and it worked fairly well. I didn't notice much difference between using a sigmoid scaled to -1 to +1 for the cosine/sine regression v.s. trying to directly regress the values.

Hi , emm, woulu you mind sharing your code for rotated bbox regresion? I've made it work base on this repo, but not well enough (about mAP0.23 for 416*416 inputs ), I dont't know exactly where the question lies.....Sorry to disturb you.

glenn-jocher commented 4 years ago

@HamsterHuey yes can you share your inference output function? As far as I know the only way to regress without wrapping issues would be to use a 2D unit vector, but this requires two variables rather than 1.

I think any angle-only representation would always inherently wrap somewhere, typically either at 0/360 or +180/-180.

glenn-jocher commented 4 years ago

Perhaps scaling the wrapping to overlap by 10-20% might mitigate some of the wrapping issues, i.e. if your angle output was scaled from -200 to 200deg rather than -180 to 180, then there would be redundant room at the edges of the model angle output to wrap slightly without issue (though of course the training data would always be constrained from -180 to 180).

ming71 commented 4 years ago

Perhaps scaling the wrapping to overlap by 10-20% might mitigate some of the wrapping issues, i.e. if your angle output was scaled from -200 to 200deg rather than -180 to 180, then there would be redundant room at the edges of the model angle output to wrap slightly without issue (though of course the training data would always be constrained from -180 to 180).

This is my changes in build_target():

        gxy -= gxy.floor()  # xy  
        gwha[:,:2] = torch.log(gwha[:,:2] / anchor_vec[a][:,:2]) 
        gwha[:, 2] = torch.tan(gwha[:, 2] - anchor_vec[a][:, 2])
        tbox.append(torch.cat((gxy, gwha), 1))  
        av.append(anchor_vec[a])  # anchor vec

and compute_loss:

        pxy = torch.sigmoid(ps[:, 0:2])  # pxy = pxy * s - (s - 1) / 2,  s = 1.5  (scale_xy)
        pbox = torch.cat((pxy, ps[:, 2:5]) , 1)        

        lreg = lreg +  SM(pbox,tbox[i]) 

inference out in model.py:

io[..., 0:2] = torch.sigmoid(io[..., 0:2]) + self.grid_xy  
io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh[...,:-1]    
io[..., 4]   = torch.atan(io[..., 4]) + self.anchor_wh[...,-1]

However, mAP suffers from low recall, even overfit on a single sample, it's still hard to obtain recall=1 as follow(btw, I filter the anchor whose iou with gt less than 0.2 and angle diff > 0.5pi ): 100001284

image

I lower the iou_thre from 0.2 to 0.1, and lead to a little boost but still unsatisfactory: 100001284 image

Further raise to 0.5 : 100001284

glenn-jocher commented 4 years ago

@sawhney-medha hmm, very interesting. How do you compute IoU on rotated bounding boxes?

mAP at 0.875 like you have there is very good BTW. COCO mAP is around 0.6 to compare.

Also, remember the hyperparameters and training settings are set for COCO, you'll likely want to adjust them to your specific task.

ming71 commented 4 years ago

Skew-IoU computation is realized with python package shapely . Though 0.875 is pretty good for COCO , this is not COCO...but just one single picture, a well-woked algorithm ought to overfit and reach mAP 1.0. When I train on the whole dataset ,umm.....that's terrible! It's supposed to be above 0.8 for this dataset(provided by other one-stage detector) . image

glenn-jocher commented 4 years ago

@ming71 remember training uses augmentation by default, so your 'one single picture' never appears more than once to the network during training, it's always different.

Your loss function doesn't look very healthy, you should probably analyze its components to see the cause.

ming71 commented 4 years ago

@ming71 remember training uses augmentation by default, so your 'one single picture' never appears more than once to the network during training, it's always different.

Your loss function doesn't look very healthy, you should probably analyze its components to see the cause.

I've stopped augmentation before, ship detection result displayed above is concluded without augmentation.I think it's low recall that affects the performance, but have no idea how to solve it.

glenn-jocher commented 4 years ago

@ming71 I've updated the single-image tutorial with the latest results: https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Image

The single image gets to 0.67 mAP when training from scratch using all default settings, including augmentation (you may need to git pull to get the latest). These same default settings train COCO to about 0.60 mAP also. Anytime you create custom functionality you need to analyze the loss components, the parameter space, the loss functions, tune hyperparameters etc. in order to get the best results. The current repo is simply a starting point you can use.

We offer AI consulting in this area also, if you are interested we can send you a quote to complete your requirements. You can see some of our work on satellite imagery for example here: https://github.com/ultralytics/xview-yolov3

ming71 commented 4 years ago

@ming71 I've updated the single-image tutorial with the latest results: https://github.com/ultralytics/yolov3/wiki/Example:-Train-Single-Image

The single image gets to 0.67 mAP when training from scratch using all default settings, including augmentation (you may need to git pull to get the latest). These same default settings train COCO to about 0.60 mAP also. Anytime you create custom functionality you need to analyze the loss components, the parameter space, the loss functions, tune hyperparameters etc. in order to get the best results. The current repo is simply a starting point you can use.

We offer AI consulting in this area also, if you are interested we can send you a quote to complete your requirements. You can see some of our work on satellite imagery for example here: https://github.com/ultralytics/xview-yolov3

Thanks for your suggestion, your implementation of yolov3 really helps, I'll continue in the work in hand.

Rhutus commented 4 years ago

@ming71> build_target

Can you please help me out where is this build_target() and compute_loss. And do we require any extra parameter while training with .txt file i mean do we require orientation of bounding box α angle.

ming71 commented 4 years ago

@ming71> build_target

Can you please help me out where is this build_target() and compute_loss. And do we require any extra parameter while training with .txt file i mean do we require orientation of bounding box α angle.

I don't know where is it in this repo. I've been using my modified version. Extra parameter is surely required for rotated bbox regression for this repo, you need to add yourself.

Rhutus commented 4 years ago

@ming71> build_target Can you please help me out where is this build_target() and compute_loss. And do we require any extra parameter while training with .txt file i mean do we require orientation of bounding box α angle.

I don't know where is it in this repo. I've been using my modified version. Extra parameter is surely required for rotated bbox regression for this repo, you need to add yourself.

can please show me any image with its annotation file. Thanks in advance

Rhutus commented 4 years ago

@ming71> build_target Can you please help me out where is this build_target() and compute_loss. And do we require any extra parameter while training with .txt file i mean do we require orientation of bounding box α angle.

I don't know where is it in this repo. I've been using my modified version. Extra parameter is surely required for rotated bbox regression for this repo, you need to add yourself.

can please show me any image with its annotation file. Thanks in advance

robbox

SteelMinh commented 4 years ago

Hello, Thanks for the implementation!

I train my own datasets with lego's on jetson nano and the detection works!

Only 10 epochs: Bildschirmfoto vom 2020-01-10 17-18-49 Bildschirmfoto vom 2020-01-10 17-24-15

Now i want the rotation bounding box, but i have no idea how i can realize this. Thanks for your help!

glenn-jocher commented 4 years ago

@SteelMinh great, your results look good! I highly suggest you train longer than 10 epochs though. For smaller datasets your probably want at least 300 epochs, or up to 1000 if you don't see any overtraining signs at 300.

Rotated bounding boxes are not difficult to add to the loss function, the main problem is you need data labelled for the rotation of each lego.

SteelMinh commented 4 years ago

@glenn-jocher thank you for the quick response. Which format should the data be? XML or txt? This should only be a test. With the change for the rotation bbox, i will train it longer than 10 epochs.

What do i have to add to the loss function, after i labelled the images?

glenn-jocher commented 4 years ago

@sawhney-medha the format is not important, though if you want to keep things consistent you can just keep using the same label format (txt file) with an an added column for the rotation label. I don't have time to walk you through the process, I suggest you contact some of the other posters on this issue above.

mapplics commented 4 years ago

@SteelMinh how you run this model in your jetson, did you use Docker? Because was impossible for me run the container. Thanks in advance!

MiXaiLL76 commented 4 years ago

Are there any progress on the work with the rotation?

FranciscoReveriano commented 4 years ago

I believe not. But you are welcomed to contribute.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove Stale label or comment or this will be closed in 5 days.

ming71 commented 4 years ago

Hi I released ryolov3 code just now, to be honest, not good enough(in my oponion), but worked well in some feild, you can take it as a reffreence. @FranciscoReveriano @Rhutus

MiXaiLL76 commented 4 years ago

Great job! I'll try rewriting on tf2

ZeKunZhang1998 commented 4 years ago

Perhaps scaling the wrapping to overlap by 10-20% might mitigate some of the wrapping issues, i.e. if your angle output was scaled from -200 to 200deg rather than -180 to 180, then there would be redundant room at the edges of the model angle output to wrap slightly without issue (though of course the training data would always be constrained from -180 to 180).

This is my changes in build_target():

        gxy -= gxy.floor()  # xy  
        gwha[:,:2] = torch.log(gwha[:,:2] / anchor_vec[a][:,:2]) 
        gwha[:, 2] = torch.tan(gwha[:, 2] - anchor_vec[a][:, 2])
        tbox.append(torch.cat((gxy, gwha), 1))  
        av.append(anchor_vec[a])  # anchor vec

and compute_loss:

        pxy = torch.sigmoid(ps[:, 0:2])  # pxy = pxy * s - (s - 1) / 2,  s = 1.5  (scale_xy)
        pbox = torch.cat((pxy, ps[:, 2:5]) , 1)        

        lreg = lreg +  SM(pbox,tbox[i]) 

inference out in model.py:

io[..., 0:2] = torch.sigmoid(io[..., 0:2]) + self.grid_xy  
io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh[...,:-1]    
io[..., 4]   = torch.atan(io[..., 4]) + self.anchor_wh[...,-1]

However, mAP suffers from low recall, even overfit on a single sample, it's still hard to obtain recall=1 as follow(btw, I filter the anchor whose iou with gt less than 0.2 and angle diff > 0.5pi ): 100001284

image

I lower the iou_thre from 0.2 to 0.1, and lead to a little boost but still unsatisfactory: 100001284 image

Further raise to 0.5 : 100001284

Hello,where to see the format of the txt file of dataset?

umanniyaz commented 2 years ago

@glenn-jocher @ming71 Can anyone help here for yolov7,my model is well trained with R & P of over 90% for horizontal boxes but I want to include edge cases like rotated bounding boxes also in my model. Can you tell how to include theta angle in annotation and what changes are required?

glenn-jocher commented 2 years ago

@umanniyaz the best rotated-box implementation is probably here: https://github.com/hukaixuan19970627/yolov5_obb

umanniyaz commented 2 years ago

@glenn-jocher Can you tell me the logic behind and how to train the yolov7 and what should be the format for Annotation apart from [x,y,w,h] (I know angle theta needs to be inculcated) but tell me the format so that I can train one model only for both Axis Aligned and Rotated Boxes

glenn-jocher commented 2 years ago

@umanniyaz ask at https://github.com/hukaixuan19970627/yolov5_obb