Open AlexeyAB opened 4 years ago
Hi @AlexeyAB ,
Thanks for your comments. I'm trying to improve its performance before writing up a comparison.
Actually, the IoU calculation for polygons is very expensive and different from the IoU calculation for boxes in 2D images (like the COCO dataset) because we need to consider both sizes and rotations of boxes. Hence, I haven't taken advantage of CIoU or GIoU loss for optimization. I'm trying to speed up the IoU calculation in this task.
P/s: That's great for me to talk with the author of YOLOv4 :) Thanks for your great publication.
@maudzung
Hence, I haven't taken advantage of CIoU or GIoU loss for optimization.
Did you try CIoU/GIoU for training with 3D-bboxes and it didn't increase accuracy?
I'm trying to speed up the IoU calculation in this task.
Do you try to accelerate IoU claculation, or do you try to improve accuracy?
P/s: That's great for me to talk with the author of YOLOv4 :) Thanks for your great publication.
Thanks!
Thank you @AlexeyAB
I haven't used CIoU or GIoU loss yet, I'm trying to apply them to the loss function. I'm also trying to speed up the non-max-suppression step in the inference phase. At this moment, I couldn't vectorize the IoU calculation in the step. So If there are many boxes that have high confidences, the postprocessing speed will be slow.
I tried to detected rotated faces and came across the same problem of rotated bounding box intersection over union calculation. I think that is will be very difficult to get a derivative of this very complex function. I instead tried to distill the idea from giou and predicted the size and angle instead of width and height and got better results than traditional bounding box prediction. Maybe this could be worth a try for you too: https://www.researchgate.net/publication/335538424_Detecting_Arbitrarily_Rotated_Faces_for_Face_Analysis
Thank you so much @fsaxen
Hi @AlexeyAB
I have added the implementation of GIoU loss for rotated boxes. I'm running experiments on it to test its performance.
Can you please share with me the weights of different components of the total_loss in your implementation?
At this time, I have set lgiou_scale = lobj_scale = lcls_scale = 1.
total_loss = loss_giou * lgiou_scale + loss_obj * lobj_scale + loss_cls * lcls_scale
Thank you so much!
@maudzung Hi,
I use:
lgiou_scale = 0.07
lobj_scale = 1.0
lcls_scale = 1.0
Also you can try
lgiou_scale = 0.05
lobj_scale = 1.0
lcls_scale = 0.6
Thank you @AlexeyAB for your quick response.
I have 1 more question. Did you apply weights noobj_scale
and obj_scale
for the loss_obj
as YOLOv3? Your answer can help me save a ton of time that spends not only on reading your code but also running experiments.
I'm looking forward to hearing that from you.
Thank you once again!
What do you mean? I use:
if (truth) { // for object
delta_bbox[i] = giou_delta[i] * lgiou_scale;
delta_objectness = (1 - output[obj_index]) * lobj_scale;
for(int k = 0; k < classes; ++k) {
if(k == truth.class_id) delta_class_probability[k] = (1 - output[cls_index + k]) * lcls_scale;
else delta_class_probability[k] = (0 - output[cls_index + k]) * lcls_scale;
}
}
else { // for no object
delta_objectness = (0 - output[obj_index]) * lobj_scale;
}
@maudzung Hi, Did you get any results, or do you training it on Kitti?
I ran the experiments on 6k samples with MSE loss and evaluated on 1.4k samples. The mAP for Complex-YOLOv3 and Complex-YOLOv4 are 0.90 and 0.89 corresponding. I tried to visualize the predictions of each sample and compare both two models. I observed that the v4 model works better than the v3 model on detecting small objects.
The Complex-YOLO could detect 5 degrees of freedom (x, y, width, length, and yaw)
of objects. Recently, I have expanded the work to the 7-DOF model. My implementation is here YOLO3D-YOLOv4.
I plan to train the network on Waymo Open Dataset. This can help me avoid the overfitting problem.
I ran the experiments on 6k samples with MSE loss and evaluated on 1.4k samples. The mAP for Complex-YOLOv3 and Complex-YOLOv4 are 0.90 and 0.89 corresponding. I tried to visualize the predictions of each sample and compare both two models. I observed that the v4 model works better than the v3 model on detecting small objects.
Why is the mAP for Complex-YOLOv3 higher than for Complex-YOLOv4? Do you use mAP@0.5 or mAP@0.5...0.95? What pre-trained weights do you use for training?
The Complex-YOLO could detect 5 degrees of freedom (x, y, width, length, and yaw) of objects. Recently, I have expanded the work to the 7-DOF model. My implementation is here YOLO3D-YOLOv4.
I plan to train the network on Waymo Open Dataset. This can help me avoid the overfitting problem.
Great! Is YOLO3D better than Complex-YOLOv3 in terms of accuracy, or only 7-DOF vs 5-DOF?
Also what do you think about CenterNet3D: An Anchor free Object Detector for Autonomous Driving https://arxiv.org/abs/2007.07214 ?
Hi @AlexeyAB
Do you use mAP@0.5 or mAP@0.5...0.95?
I evaluated with mAP@0.5. I'll use mAP@0.5->0.95 to evaluate the models.
Why is the mAP for Complex-YOLOv3 higher than for Complex-YOLOv4? What pre-trained weights do you use for training?
In both models, I didn't use transfer learning. I trained the models from scratch. That's why I plan to train networks on a bigger dataset.
Also what do you think about CenterNet3D: An Anchor free Object Detector for Autonomous Driving https://arxiv.org/abs/2007.07214 ?
Thank you so much for your suggestion. I'll read the paper.
Hi @AlexeyAB
I read the paper that you suggest and tried to implement it, but it could not run real-time and the method was proposed only for car detection. Hence, now I'm waiting for the official code from the author.
Based on the CenterNet ideas, I have developed a new repo here. Amazing, the model works well with pedestrians and cyclists detection, and cars also.
Thank you once again for your great paper, your answers, and your suggestion. I have learned a lot from your YOLOv4 paper 💯
@maudzung Hi,
I read the paper that you suggest and tried to implement it, but it could not run real-time and the method was proposed only for car detection. Hence, now I'm waiting for the official code from the author.
Based on the CenterNet ideas, I have developed a new repo here. Amazing, the model works well with pedestrians and cyclists detection, and cars also.
Great!
Do you mean that Voxelization -> 3d convolution ndchw -> Conv2d
is very slow (only ~25 FPS), so you replaced it with small resnet18 + FPN
and it works very fast ~95 FPS (~4x faster), and at first glance, the accuracy did not drop much?
Did you try to use Joint Detection and Tracking / Embeddings? https://github.com/ifzhang/FairMOT and https://paperswithcode.com/sota/multi-object-tracking-on-mot16 If you replace CenterNet in FairMOT with YOLOv4, it will be Top1.
Do you mean that Voxelization -> 3d convolution ndchw -> Conv2d is very slow (only ~25 FPS), so you replaced it with small resnet18 + FPN and it works very fast ~95 FPS (~4x faster), and at first glance, the accuracy did not drop much?
Yes. Although I used a spconv
lib to implement the Voxelization step and build model, the speed was very slow, around 7FPS for the forward pass
only.
Did you try to use Joint Detection and Tracking / Embeddings? https://github.com/ifzhang/FairMOT and https://paperswithcode.com/sota/multi-object-tracking-on-mot16 If you replace CenterNet in FairMOT with YOLOv4, it will be Top1.
I tested FairMOT implementation, it's also great, but I didn't try to jointly detect and track objects. Thanks for the suggestions, I'll investigate it.
@maudzung Hi, Nice work! Did you compare speed and accuracy of Complex-YOLOv4-Pytorch vs other algorithms on Kitti dataset? Is it still better in accuracy and speed than other competitors?
Also some reference with implementations of CIoU.
Examples:
C: https://github.com/AlexeyAB/darknet/blob/a71b9a6e9a009cf94900c53deb344c5204835700/src/box.c#L233-L256
Matlab: https://github.com/Zzh-tju/DIoU/blob/master/simulation%20experiment/dCIOU.m
Python: https://github.com/VCasecnikovs/Yet-Another-YOLOv4-Pytorch/blob/2e18612e1852abbf35b4dac55a00f2a3b2d814ed/model.py#L527-L561
Python: https://github.com/ultralytics/yolov3/blob/eca5b9c1d36e4f73bf2f94e141d864f1c2739e23/utils/utils.py#L262-L282
Desctiption: https://medium.com/@jonathan_hui/yolov4-c9901eaa8e61