Did you compare speed and accuracy of Complex-YOLOv4 vs other algorithms on Kitti dataset?

AlexeyAB commented 4 years ago

@maudzung Hi, Nice work! Did you compare speed and accuracy of Complex-YOLOv4-Pytorch vs other algorithms on Kitti dataset? Is it still better in accuracy and speed than other competitors?

Also some reference with implementations of CIoU.

Examples:

Desctiption: https://medium.com/@jonathan_hui/yolov4-c9901eaa8e61

1_egnHdTEZcYJKkcPkDTgaVw

maudzung commented 4 years ago

Hi @AlexeyAB ,

Thanks for your comments. I'm trying to improve its performance before writing up a comparison.

Actually, the IoU calculation for polygons is very expensive and different from the IoU calculation for boxes in 2D images (like the COCO dataset) because we need to consider both sizes and rotations of boxes. Hence, I haven't taken advantage of CIoU or GIoU loss for optimization. I'm trying to speed up the IoU calculation in this task.

P/s: That's great for me to talk with the author of YOLOv4 :) Thanks for your great publication.

AlexeyAB commented 4 years ago

@maudzung

Hence, I haven't taken advantage of CIoU or GIoU loss for optimization.

Did you try CIoU/GIoU for training with 3D-bboxes and it didn't increase accuracy?

I'm trying to speed up the IoU calculation in this task.

Do you try to accelerate IoU claculation, or do you try to improve accuracy?

P/s: That's great for me to talk with the author of YOLOv4 :) Thanks for your great publication.

Thanks!

maudzung commented 4 years ago

Thank you @AlexeyAB

I haven't used CIoU or GIoU loss yet, I'm trying to apply them to the loss function. I'm also trying to speed up the non-max-suppression step in the inference phase. At this moment, I couldn't vectorize the IoU calculation in the step. So If there are many boxes that have high confidences, the postprocessing speed will be slow.

fsaxen commented 4 years ago

I tried to detected rotated faces and came across the same problem of rotated bounding box intersection over union calculation. I think that is will be very difficult to get a derivative of this very complex function. I instead tried to distill the idea from giou and predicted the size and angle instead of width and height and got better results than traditional bounding box prediction. Maybe this could be worth a try for you too: https://www.researchgate.net/publication/335538424_Detecting_Arbitrarily_Rotated_Faces_for_Face_Analysis

maudzung commented 4 years ago

Thank you so much @fsaxen

maudzung commented 4 years ago

Hi @AlexeyAB

I have added the implementation of GIoU loss for rotated boxes. I'm running experiments on it to test its performance. Can you please share with me the weights of different components of the total_loss in your implementation? At this time, I have set lgiou_scale = lobj_scale = lcls_scale = 1.

total_loss = loss_giou * lgiou_scale + loss_obj * lobj_scale + loss_cls * lcls_scale

Thank you so much!

AlexeyAB commented 4 years ago

@maudzung Hi,

I use:

lgiou_scale = 0.07
lobj_scale = 1.0
lcls_scale = 1.0

Also you can try

lgiou_scale = 0.05
lobj_scale = 1.0
lcls_scale = 0.6

maudzung commented 4 years ago

Thank you @AlexeyAB for your quick response. I have 1 more question. Did you apply weights noobj_scale and obj_scale for the loss_obj as YOLOv3? Your answer can help me save a ton of time that spends not only on reading your code but also running experiments. I'm looking forward to hearing that from you. Thank you once again!

AlexeyAB commented 4 years ago

What do you mean? I use:

if (truth) { // for object
  delta_bbox[i] = giou_delta[i] * lgiou_scale;
  delta_objectness = (1 - output[obj_index]) * lobj_scale;

  for(int k = 0; k < classes; ++k) {
    if(k == truth.class_id)   delta_class_probability[k] = (1 - output[cls_index + k]) * lcls_scale;
    else   delta_class_probability[k] = (0 - output[cls_index + k]) * lcls_scale;
  }
} 
else { // for no object
  delta_objectness = (0 - output[obj_index]) * lobj_scale;
}

AlexeyAB commented 4 years ago

@maudzung Hi, Did you get any results, or do you training it on Kitti?

maudzung commented 4 years ago

I ran the experiments on 6k samples with MSE loss and evaluated on 1.4k samples. The mAP for Complex-YOLOv3 and Complex-YOLOv4 are 0.90 and 0.89 corresponding. I tried to visualize the predictions of each sample and compare both two models. I observed that the v4 model works better than the v3 model on detecting small objects.

The Complex-YOLO could detect 5 degrees of freedom (x, y, width, length, and yaw) of objects. Recently, I have expanded the work to the 7-DOF model. My implementation is here YOLO3D-YOLOv4.

I plan to train the network on Waymo Open Dataset. This can help me avoid the overfitting problem.

AlexeyAB commented 4 years ago

I ran the experiments on 6k samples with MSE loss and evaluated on 1.4k samples. The mAP for Complex-YOLOv3 and Complex-YOLOv4 are 0.90 and 0.89 corresponding. I tried to visualize the predictions of each sample and compare both two models. I observed that the v4 model works better than the v3 model on detecting small objects.

Why is the mAP for Complex-YOLOv3 higher than for Complex-YOLOv4? Do you use mAP@0.5 or mAP@0.5...0.95? What pre-trained weights do you use for training?

The Complex-YOLO could detect 5 degrees of freedom (x, y, width, length, and yaw) of objects. Recently, I have expanded the work to the 7-DOF model. My implementation is here YOLO3D-YOLOv4.

I plan to train the network on Waymo Open Dataset. This can help me avoid the overfitting problem.

Great! Is YOLO3D better than Complex-YOLOv3 in terms of accuracy, or only 7-DOF vs 5-DOF?

Also what do you think about CenterNet3D: An Anchor free Object Detector for Autonomous Driving https://arxiv.org/abs/2007.07214 ?

Voxelization
3d convolution ndchw
Conv2d
Nms: Maxpool zero_nonmax=1
Sin, cos - activations for angles
Training-only corner regression (closely related task)

maudzung commented 4 years ago

Hi @AlexeyAB

Do you use mAP@0.5 or mAP@0.5...0.95?

I evaluated with mAP@0.5. I'll use mAP@0.5->0.95 to evaluate the models.

Why is the mAP for Complex-YOLOv3 higher than for Complex-YOLOv4? What pre-trained weights do you use for training?

In both models, I didn't use transfer learning. I trained the models from scratch. That's why I plan to train networks on a bigger dataset.

Also what do you think about CenterNet3D: An Anchor free Object Detector for Autonomous Driving https://arxiv.org/abs/2007.07214 ?

Thank you so much for your suggestion. I'll read the paper.

maudzung commented 4 years ago

Hi @AlexeyAB

I read the paper that you suggest and tried to implement it, but it could not run real-time and the method was proposed only for car detection. Hence, now I'm waiting for the official code from the author.

Based on the CenterNet ideas, I have developed a new repo here. Amazing, the model works well with pedestrians and cyclists detection, and cars also.

Thank you once again for your great paper, your answers, and your suggestion. I have learned a lot from your YOLOv4 paper 💯

AlexeyAB commented 4 years ago

@maudzung Hi,

I read the paper that you suggest and tried to implement it, but it could not run real-time and the method was proposed only for car detection. Hence, now I'm waiting for the official code from the author.

Based on the CenterNet ideas, I have developed a new repo here. Amazing, the model works well with pedestrians and cyclists detection, and cars also.

Great!

Do you mean that Voxelization -> 3d convolution ndchw -> Conv2d is very slow (only ~25 FPS), so you replaced it with small resnet18 + FPN and it works very fast ~95 FPS (~4x faster), and at first glance, the accuracy did not drop much?

Did you try to use Joint Detection and Tracking / Embeddings? https://github.com/ifzhang/FairMOT and https://paperswithcode.com/sota/multi-object-tracking-on-mot16 If you replace CenterNet in FairMOT with YOLOv4, it will be Top1.

maudzung commented 4 years ago

Do you mean that Voxelization -> 3d convolution ndchw -> Conv2d is very slow (only ~25 FPS), so you replaced it with small resnet18 + FPN and it works very fast ~95 FPS (~4x faster), and at first glance, the accuracy did not drop much?

Yes. Although I used a spconv lib to implement the Voxelization step and build model, the speed was very slow, around 7FPS for the forward pass only.

Did you try to use Joint Detection and Tracking / Embeddings? https://github.com/ifzhang/FairMOT and https://paperswithcode.com/sota/multi-object-tracking-on-mot16 If you replace CenterNet in FairMOT with YOLOv4, it will be Top1.

I tested FairMOT implementation, it's also great, but I didn't try to jointly detect and track objects. Thanks for the suggestions, I'll investigate it.

maudzung / Complex-YOLOv4-Pytorch

Did you compare speed and accuracy of Complex-YOLOv4 vs other algorithms on Kitti dataset? #1