xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:
MIT License
7.2k stars 1.92k forks source link

CenterNet-HarDNet85 with 43.6mAP(test-dev) / 45fps(1080ti) #802

Open PingoLH opened 3 years ago

PingoLH commented 3 years ago

Hi Mr. Zhou, Thank you for your well organized code. Recently I forked this CenterNet repo and integrated it with our HarDNet as a backbone. The result is not bad so we just wanted to share with you CenterNet-HarDNet85, which achieves 43.6 bbox mAP(.5-.95) @ 45 fps on 1080ti with plain pytorch framework (slightly faster than YOLOv4). We would love to merge it back to your main branch, which will includes the new model and changes in dataset sampler, get_affine, dense_reg, and something related to data augmentation. Please let us know if you think this is a good idea. Thank you very much.

xingyizhou commented 3 years ago

Hi Ping, That's amazing! We definitely welcome such a strong contribution. Please feel free to create a pull request. I am happy to add your results to my branch (with reviews and consistency checks). Thanks!

Best, Xingyi

Dantju commented 3 years ago

@PingoLH have you convert HarDNet85 from pytorch to caffemodel?

Dantju commented 3 years ago

@PingoLH this backbone is faster than DLA-34,but why the train time is so longer than DLA-34

PingoLH commented 3 years ago

@xingyizhou Thank you very much for the quick reply. Sorry that I took some time on some finetuning of the code so I didn't prepare the pull request immediately. About this merge, I think there will be some issues need to be discussed first:

  1. The biggest problem is that in ctdet hardnet model, we merge the "wh" and "reg" heads into one head such that the two 256x2 convolutions can be merged into one 256x4 convolution (we also reduced 256 to 128) which can slightly speedup network inference time. So, this change is not compatible with the other pretrained models like ctdet-dla34. There could be two solutions: (1) create another task name like ctdet2 for this change to differentiate with the original ctdet. (2) In detector and trainer, check the wh head dimension size to branch the original and new process, however, it might make the code ugly.
  2. We made a lot of changes to the input transformation and augmentation, which is very different from the main branch. The resize range becomes wider and there is a distortion on the aspect ratio. The change might improve the mAP for DLA-34 and other existing models, but it still needs to be proved through training experiments, which can be a problem since we don't have enough resources to do so. So, I just wanted to know if you think it is ok to make this change anyway, or should I make the change to be another new CTDetDataset object or another transform method in CTDetDataset that can be selected by Opt, such that the original augmentation can be preserved?

Please kindly let me know how do you think about these two questions. Thank you very much!

PingoLH commented 3 years ago

@Dantju, Thank you for the feedback. The DLA-34 should be faster than HarDNet-85, and roughly the same as HarDNet-68. The training took much longer time because we use more epochs and runs on only two GPUs. If you're talking about your own experiment, please kindly provide your environment setting and measurements. Thanks.

xingyizhou commented 3 years ago

Hi @PingoLH , Thanks for your detailed response and sorry for the delayed reply. I have read most of your code and have played with it a bit, thanks for the excellent modification!

  1. Both works for me. I slightly prefer (1) for clearness.
  2. For now it would be great if you can preserve my original augmentation --- changing the aspect ratio looks a bit strange to me. Adding an option for using your augmentation (in both training and testing) or mine will be great and seems reasonably feasible to modify.

Thanks, Xingyi

Dantju commented 3 years ago

@PingoLH I add dla34 to your codebase and train it,but get a worse result,I want to know the reason,have u try the dla34?