Closed filaPro closed 3 years ago
Hi @filaPro ,
Thanks again for your willingness to contribute. For these questions, in my opinion:
Generally speaking, I would recommend you split the modules into several small PRs such that there will be not much burden for us to review and discuss each one. You can also refer to how I push the FCOS3D to this repo. For others, you can basically follow your idea because you are definitely more familiar with your codes, and we can discuss them after PRs are created.
Hi @Tai-Wang ,
I made an initial pull request #627. It almost achieves our metrics on monocular car detection on KITTI. The code is much simpler now, as I utilized your point_fusion
instead of projection from Atlas.
Hi @Tai-Wang ,
I made an initial pull request #627. It almost achieves our metrics on monocular car detection on KITTI. The code is much simpler now, as I utilized your
point_fusion
instead of projection from Atlas.
Great, it looks good. We will have a brief review ASAP.
Great work!
Hi @Tai-Wang ,
PR #627 for KITTI is (almost?) ready to merge. I can move forward for multi-view RGB detection on ScanNet. However I think first we need as separate PR to manage RGB images with poses for this dataset. Here I have a question.
I understand, that there is probably no good way to unify extrinsic and intrinsic matrices for all indoor and outdoor datasets of mmdetection3d. As we developed a general-purpose detector, I unified these matrices for all 4 datasets: scannet, kitti, nuscenes, sunrgbd. Unlikely, you are ready to make such changes in this repo. However your implementation looks a little messy: img_meta['cam_intrinsic']
for nuscenes_mono, img_meta['lidar2img']
for kitti, nothing for scannet, calib
for sunrgbd. Simple question is why calib
is outside of img_meta
? More complicated question is how to manage these options in general-purpose detector? Currently, data_info
doesn't contain the name of dataset, so code that takes into account camera poses should rely on key differences in metadata.
What do you think about this? Of course, I can simply update ScanNet
with calib
similar to SUN RGB-D
. But we can try to unify something first.
Yes, after merging #627 , you can create a README page for imvoxelnet in the configs and add corresponding bib/benchmark with checkpoints in a new PR.
As for different representations/names of poses recorded the infos of different datasets, I think it would be better to unify their names and we prefer to giving outdoor datasets (nuScenes and KITTI) the priority, i.e., if there is any conflict, we'd better modify the codes for scannet & sunrgbd (I guess it would be more relaxed due to fewer models needed to be adjusted).
My own implementation of monocular methods on KITTI and nuScenes is based on the coco-style jsons due to its convenience considering the usage of mmdet modules, but I think your way to directly use mmdet3d infos is also elegant, so you can basically keep doing it in that way.
My proposal of unifying intrinsic and extrinsic matrices for all datasets.
Recap of what we have now:
img_meta['lidar2img']
for outdoor point cloud datasetsimg_meta['cam_intrinsic']
for monocular outdoor datasetscalib['Rt']
and calib['K']
for SUN RGB-DFirst, I think there is no reason to put calib
outside of img_meta
. Also, this calib
is only used in VoteFusion and visualization. And in both places points are multiplied by Rt
and after that to K
. So ther is almost no need to keep them separate. And with the same logic of how lidar2img
is used in PointFusion
we can aggregate Rt
and K
to depth2img
. And the last thing, if we need cam_intrinsic
for monocular datasets we can rename it to cam2img
. So except of multiple img_meta
and calib
keys we can keep only img_mata['lidar2img']
, img_meta['depth2img']
and img_meta['cam2img']
. For each datasets there may be from 0 to 3 of them. Also for multi-view datasets (ScanNet(~1000 x RGB) and NuScenes(6 x RGB)) this keys will contain a list of matrices.
I can make a PR with all of this if you (and mmdetection3d team) agree. cc @Tai-Wang
@filaPro Yeah, I agree. This unification is fine. Feel free to make a PR to adjust them for convenience. The only thing is that we need to guarantee all the related operations will be changed accordingly. Maybe we can help you double check this issue when reviewing.
Looks like here should be K[1, 1]
instead of K[0, 0]
. They are not exactly same even for some images of SUN RGB-D. I've refactored intrinsics here and validation mAP even grown from 64.06 to 64.61. Do you agree with it? сс @THU17cyz @Tai-Wang
Hello, filaPro! Thanks for your great work ImVoxelNet. I wonder will you support nuscenes version of ImVoxelNet in mmdetection3d. Appreciate it a lot!
Hello, filaPro! Thanks for your great work ImVoxelNet. I wonder will you support nuscenes version of ImVoxelNet in mmdetection3d. Appreciate it a lot!
Hi @gujiaqivadin ,
For now our nuScenes is not ready for mmdetection3d, as the benchmark here supports all categories, but our model - only car
. You can check the model imvoxelnet_nuscenes.py in original ImVoxelNet
repo.
Hi, @Tai-Wang and mmdetection3d team,
This issue continues the discussion in https://github.com/saic-vul/imvoxelnet/issues/7. If you are sure, it would be great to have ImVoxelNet in this codebase, I can make a PR. I would like to clarify some points first.
SUN RGB-D. Unfortunately there are 2 monocular benchmarks: PerspectiveNet with 30 classes, Total3DUnderstanding with 37 classes + camera pose estimation + 3d room layout estimation. In our paper we also present metrics for your VoteNet benchmark with 10 classes. Do you think this VoteNet option is enough for the beginning? One more question here. For SUN RGB-D we use differentiable rotated IoU from here. Do you want to adapt in to mmdetection3d?
ScanNet. Current ScanNetDataset doesn't deal with RGB images. Do we need separate MultiViewScanNet dataset and MultiViewPipeline handling random images sampling on training? Also dataset preparation instruction need to be updated.
KITTI. No questions here.
nuScenes. Following OFTNet and MonoDIS we compare only on
car
category. I think it not enough interesting for community and we don't need to make separate benchmark here.