open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.31k stars 1.54k forks source link

ImVoxelNet #620

Closed filaPro closed 3 years ago

filaPro commented 3 years ago

Hi, @Tai-Wang and mmdetection3d team,

This issue continues the discussion in https://github.com/saic-vul/imvoxelnet/issues/7. If you are sure, it would be great to have ImVoxelNet in this codebase, I can make a PR. I would like to clarify some points first.

  1. SUN RGB-D. Unfortunately there are 2 monocular benchmarks: PerspectiveNet with 30 classes, Total3DUnderstanding with 37 classes + camera pose estimation + 3d room layout estimation. In our paper we also present metrics for your VoteNet benchmark with 10 classes. Do you think this VoteNet option is enough for the beginning? One more question here. For SUN RGB-D we use differentiable rotated IoU from here. Do you want to adapt in to mmdetection3d?

  2. ScanNet. Current ScanNetDataset doesn't deal with RGB images. Do we need separate MultiViewScanNet dataset and MultiViewPipeline handling random images sampling on training? Also dataset preparation instruction need to be updated.

  3. KITTI. No questions here.

  4. nuScenes. Following OFTNet and MonoDIS we compare only on car category. I think it not enough interesting for community and we don't need to make separate benchmark here.

Tai-Wang commented 3 years ago

Hi @filaPro ,

Thanks again for your willingness to contribute. For these questions, in my opinion:

  1. I prefer to beginning with our 10-class setting, which can provide an intuitive comparison with methods based on other modalities. Also it would be easier and clearer in terms of the data preparation. As for the different rotated IoU, I think we can first reproduce the performance, then have a closer look at this module (maybe survey its advantages and the connection/difference with that in our repo) and decide how to deal with it.
  2. If we need new preprocessing scripts and loading pipelines, maybe you can refer to how we add image processing to KITTI and nuScenes. I guess new data converter functions and classes are similarly needed.
  3. Yes, I agree with you. Maybe we can try to extend your method to multi-class detection in the future.

Generally speaking, I would recommend you split the modules into several small PRs such that there will be not much burden for us to review and discuss each one. You can also refer to how I push the FCOS3D to this repo. For others, you can basically follow your idea because you are definitely more familiar with your codes, and we can discuss them after PRs are created.

filaPro commented 3 years ago

Hi @Tai-Wang ,

I made an initial pull request #627. It almost achieves our metrics on monocular car detection on KITTI. The code is much simpler now, as I utilized your point_fusion instead of projection from Atlas.

Tai-Wang commented 3 years ago

Hi @Tai-Wang ,

I made an initial pull request #627. It almost achieves our metrics on monocular car detection on KITTI. The code is much simpler now, as I utilized your point_fusion instead of projection from Atlas.

Great, it looks good. We will have a brief review ASAP.

gujiaqivadin commented 3 years ago

Great work!

filaPro commented 3 years ago

Hi @Tai-Wang ,

PR #627 for KITTI is (almost?) ready to merge. I can move forward for multi-view RGB detection on ScanNet. However I think first we need as separate PR to manage RGB images with poses for this dataset. Here I have a question.

I understand, that there is probably no good way to unify extrinsic and intrinsic matrices for all indoor and outdoor datasets of mmdetection3d. As we developed a general-purpose detector, I unified these matrices for all 4 datasets: scannet, kitti, nuscenes, sunrgbd. Unlikely, you are ready to make such changes in this repo. However your implementation looks a little messy: img_meta['cam_intrinsic'] for nuscenes_mono, img_meta['lidar2img'] for kitti, nothing for scannet, calib for sunrgbd. Simple question is why calib is outside of img_meta? More complicated question is how to manage these options in general-purpose detector? Currently, data_info doesn't contain the name of dataset, so code that takes into account camera poses should rely on key differences in metadata.

What do you think about this? Of course, I can simply update ScanNet with calib similar to SUN RGB-D. But we can try to unify something first.

Tai-Wang commented 3 years ago

Yes, after merging #627 , you can create a README page for imvoxelnet in the configs and add corresponding bib/benchmark with checkpoints in a new PR.

As for different representations/names of poses recorded the infos of different datasets, I think it would be better to unify their names and we prefer to giving outdoor datasets (nuScenes and KITTI) the priority, i.e., if there is any conflict, we'd better modify the codes for scannet & sunrgbd (I guess it would be more relaxed due to fewer models needed to be adjusted).

My own implementation of monocular methods on KITTI and nuScenes is based on the coco-style jsons due to its convenience considering the usage of mmdet modules, but I think your way to directly use mmdet3d infos is also elegant, so you can basically keep doing it in that way.

filaPro commented 3 years ago

My proposal of unifying intrinsic and extrinsic matrices for all datasets.

Recap of what we have now:

First, I think there is no reason to put calib outside of img_meta. Also, this calib is only used in VoteFusion and visualization. And in both places points are multiplied by Rt and after that to K. So ther is almost no need to keep them separate. And with the same logic of how lidar2img is used in PointFusion we can aggregate Rt and K to depth2img. And the last thing, if we need cam_intrinsic for monocular datasets we can rename it to cam2img. So except of multiple img_meta and calib keys we can keep only img_mata['lidar2img'], img_meta['depth2img'] and img_meta['cam2img']. For each datasets there may be from 0 to 3 of them. Also for multi-view datasets (ScanNet(~1000 x RGB) and NuScenes(6 x RGB)) this keys will contain a list of matrices.

I can make a PR with all of this if you (and mmdetection3d team) agree. cc @Tai-Wang

Tai-Wang commented 3 years ago

@filaPro Yeah, I agree. This unification is fine. Feel free to make a PR to adjust them for convenience. The only thing is that we need to guarantee all the related operations will be changed accordingly. Maybe we can help you double check this issue when reviewing.

filaPro commented 3 years ago

Looks like here should be K[1, 1] instead of K[0, 0]. They are not exactly same even for some images of SUN RGB-D. I've refactored intrinsics here and validation mAP even grown from 64.06 to 64.61. Do you agree with it? сс @THU17cyz @Tai-Wang

gujiaqivadin commented 3 years ago

Hello, filaPro! Thanks for your great work ImVoxelNet. I wonder will you support nuscenes version of ImVoxelNet in mmdetection3d. Appreciate it a lot!

filaPro commented 3 years ago

Hello, filaPro! Thanks for your great work ImVoxelNet. I wonder will you support nuscenes version of ImVoxelNet in mmdetection3d. Appreciate it a lot!

Hi @gujiaqivadin , For now our nuScenes is not ready for mmdetection3d, as the benchmark here supports all categories, but our model - only car. You can check the model imvoxelnet_nuscenes.py in original ImVoxelNet repo.