pangsu0613 / CLOCs

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
MIT License
345 stars 68 forks source link

More questions for nuScenes #44

Open JessieW0806 opened 3 years ago

JessieW0806 commented 3 years ago

Hello @JessieW0806.

  1. We have tested CLOCs on nuScenes, but we can not share it with you for now.
  2. nuScenes has 10 classes. CLOCs is a very small network (similar size to a detection head in other detectors), we simply use 10 separatee CLOCs networks for 10 different classes. If there are too many detection candidates for each class, you can do some simple filtering (thresholding) to reduce the amount, and based on our experience, this works well.
  3. I think it is feasible. Currently, CLOCs takes 3D bbox (from 3D detector, it could be LiDAR or other sensor based) and 2D bbox (from 2D detector, it could be camera or other sensor based). I guess millimeter-wave radar can only provide top-down view (BEV) measurements, such as object center points or BEV bounding boxes. There are some modifications you need to make to project these radar detections into the image and design a data association metric (for CLOCs, we use IoU) to associate these project radar detections and other 2D detections. I think this depends what detection format you have.

Originally posted by @pangsu0613 in https://github.com/pangsu0613/CLOCs/issues/40#issuecomment-882081390

JessieW0806 commented 3 years ago

Sorry to bother you again! I have questions when transfering CLOCs on nuScenes dataset.

1) Could you please ask if you implement CLOCs in SECOND1.6? or is it feasible? 2) You said "we simply use 10 separate CLOCs networks for 10 different classes.". Do you mean that each network only focuses on one category, and the other nine are set to "Don't Care"? 3) For nuScenes, there are six cameras and a 360-degree LiDAR. Does that mean six 2D detections correspond to one 3D detections? Could you please explain more detail for me about how the correspondence is set up here as it is not quite the same as KITTI?

pangsu0613 commented 3 years ago

Hello @JessieW0806 , please don't say sorry, I'll help you as much as I can.

  1. I have not implemented CLOCs in SECOND 1.6, but I would say it is feasible, I use SECOND-1.5 just because when I developed CLOCs around 2 or 3 years ago, SECOND-1.6 has not been released. Just be careful with the representations of the 3D bounding boxes in SECOND-1.6, check the orders of the 7 parameters (x,y,z,h,w,l,r), and check the center location (xyz) is on the center of the bounding box or at the bottom of the bounding box.
  2. Currenly one CLOCs network can only do fusion on one class. Therefore, for 10 classes, we use 10 CLOCs networks.
  3. Yes, LiDAR is 360 degrees field of view. There are 6 cameras in nuScenes dataset facing six different directions, each one of them covers around 90 degrees field of view (I am not sure about the exact number, but I am sure each camera can only covers a limited field of view), and each one has its own field of view. There could be some objects that are visible from two cameras, for this situation, I just simply pick the one with the higher score. For each camera, we only care about the same view from LiDAR.
JessieW0806 commented 3 years ago

Thanks a lot! 1)So you are modifying the results of nuScenes based on CLOCs(Second 1.5), right? 2) Just to be sure, the 10 CLOCs networks are trained individually just like KITTI dataset?

JessieW0806 commented 3 years ago

3)Could you share which code you ran to get the 2D detection results?:)

pangsu0613 commented 3 years ago

Hi @JessieW0806 , (1) I used CenterPoint as the codebase for nuScenes, because SECOND has limited performance, but you could use SECOND if your prefer, I would recommend CenterPoint because it is better tuned for nuScenes dataset. (2) The 10 CLOCs networks can be trained individually or jointly, if individually, just follow the KITTI dataset style. If you want to train them jointly, simply build 10 loss functions, one for each class, and you are good to go. (3) I am sorry the code for 2D detections that we use for nuScenes is a custom designed network and it is under review, we will release it in the future. But I think there are other available 2D detection networks available.