Closed Son-Goku-gpu closed 3 years ago
two_stage basically doesn't work for nuScenes. this is mentioned in our paper
Two-stage refinement does not bring an improvement over the single-stage CenterPoint model on nuScenes in our experiments. We think the reason is that the nuScenesdataset uses 32 lanes Lidar, which produces about 30k Lidarpoints per frame, about 1/6 of the number of points in the Waymon dataset, which limits the potential improvements of two-stage refinement. Similar results have been observed in previous two-stage methods like PointRCNN [45] and PV-RCNN [44].
I don't know what is the exact reason but to my knowledge, I haven't seen any papers that achieve any improvements with sampling-based two-stage refinement on nuScenes.
We mainly evaluate two-stage approaches on Waymo. Are you interested in this part?
@tianweiy As I find the second stage is similar to another work (BorderDet, ECCV 2020, Megvii), I'm not sure while I guess BorderDet may inpsire the further improvement. Also, the second stage doesn't work on nuScenes, (1) do you think it is related with the complex backbone network? I saw you used the ResNet-like structure SparseConvNet and very deep neck network compared with SECOND in Det3D, so it may be hard to extract low-level geometric features for the refinement. Besides, (2) how about the results with the SparseConvNet and NECK of SECOND? (3) and may I ask you how much time and how many GPUs do you spend on a complete training of CenterPoint on the train split? (4) With so many samples, do you build a subset (like PV-RCNN subsampling 20% data on waymo) to fasten the validation of an idea? How about the effect? Thanks!
do you think it is related with the complex backbone network? I saw you used the ResNet-like structure SparseConvNet and very deep neck network compared with SECOND in Det3D, so it may be hard to extract low-level geometric features for the refinement
Yeah, but PointPillar also doesn't work on nuScenes and it is super simple pipeline, so...
how about the results with the SparseConvNet and NECK of SECOND
We just use the CBGS's backbone. I haven't tried second backbone but according to CBGS's paper, it is a few point lower
and may I ask you how much time and how many GPUs do you spend on a complete training of CenterPoint on the train split
nuScenes or Waymo? nuScenes it is 4 GPU one day for PP, two day for VoxelNet. Waymo, it is quite long. Maybe 3 days for PP and 4 days or so for VoxelNet. We only tried full dataset training before the submission. As PP on Waymo is drastically worse than VoxelNet, I mainly use VoxelNet after the submission.
With so many samples, do you build a subset (like PV-RCNN subsampling 20% data on waymo) to fasten the validation of an idea?
As I said, I only tried full dataset training before submission. After the deadline, I have tried less training epochs https://github.com/tianweiy/CenterPoint/tree/master/configs/waymo#ablations-for-training-schedule
It takes 28 hours for 12 epochs and is 0.8mAPH worse than single stage baseline with 36 epochs. For the second stage, currently it is quite slow. But I guess if you precompute and save those features and then only train those few FC layers, it will be an hour or so. I will try to support this in the near future.
Someone told me that 1/5 subsampled is better. I will explore more training schedule details after ICCV submission. We have some plans to improve the codebase and figure out good schedules and make Waymo doable for the general public.
the second stage is similar to another work (BorderDet, ECCV 2020, Megvii), I'm not sure while I guess BorderDet may inpsire the further improvement
I know this paper. But I think the 3d setting is a bit different. The message we want to convey is that features are actually not that important (dense sampling, Borderdet, PVRCNN, etc..). It seems that at least on Waymo, a few BEV features (like 5 point in our case) could already bring significant improvements and extra features don't help too much. The main improvement comes from positive / negative sampling in two stage training and the IoU prediction branch.
@tianweiy That's great! Hope to discuss more after ICCV deadline. Thanks for your time and detailed explanation!
close for now. Feel free to create another issue or send me an email for discussion about the second stage.
the second stage is similar to another work (BorderDet, ECCV 2020, Megvii), I'm not sure while I guess BorderDet may inpsire the further improvement
I know this paper. But I think the 3d setting is a bit different. The message we want to convey is that features are actually not that important (dense sampling, Borderdet, PVRCNN, etc..). It seems that at least on Waymo, a few BEV features (like 5 point in our case) could already bring significant improvements and extra features don't help too much. The main improvement comes from positive / negative sampling in two stage training and the IoU prediction branch.
Hi @tianweiy As you said the main improvement comes from pos/meg sampling and IoU prediction branch, have you tried strategies other than randomly sampling 128 boxes with 1:1 ratio? Will more complexted label assignment strategies such as ATSS or AutoAssign helps?
Actually, I changed my mind a little bit. I recently played around with tusimple's lidar rcnn paper and it can still give 1 map on top of my two stage result. So both feature and What I mentioned above matters. I have not tried other assignment schedule(to my knowledge atss / autoassign mainly deals with first stage assignment ?)
Thanks a lot, I will try lidar rcnn later. I think I mix up the label assignment and the sampling strategy for your second stage, since they look similar ( choose pos and neg IoU threshold to define positive proposal and negative proposal). And such manually chosen IoU thresholds are not needed in one-stage CenterNet or CenterPoint, which is also one of the advantages of CenterPoint. So why you decide to find the pos/neg proposals by iou instead of using a simpler strategy such as deciding by the heatmap of the first stage? Is it necessary for 1:1 ratio of pos and neg proposals?
I just follow pvrcnn's strategy and code for this part. I haven't got time to tune it yet.
Thanks a lot for the discussion!
We mainly evaluate two-stage approaches on Waymo. Are you interested in this part?
yes!!
i was trying to train the 2-stage model with python -m torch.distributed.launch --nproc_per_node=8 configs/waymo/voxelnet/two_stage/waymo_centerpoint_voxelnet_two_sweep_two_stage_bev_5point_ft_6epoch_freeze_with_vel.py
then nothing happens but
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
then the program is over?
then nothing happens <- not sure what this is and I never met this.
please paste the full log from start to finish.
Actually, I changed my mind a little bit. I recently played around with tusimple's lidar rcnn paper and it can still give 1 map on top of my two stage result. So both feature and What I mentioned above matters. I have not tried other assignment schedule(to my knowledge atss / autoassign mainly deals with first stage assignment ?)
Hi @tianweiy , I‘m wondering what setting are you using from lidar-rcnn to get 1 map gain. Thanks a lot!
Actually, I changed my mind a little bit. I recently played around with tusimple's lidar rcnn paper and it can still give 1 map on top of my two stage result. So both feature and What I mentioned above matters. I have not tried other assignment schedule(to my knowledge atss / autoassign mainly deals with first stage assignment ?)
Hi @tianweiy ,I'am also wondering do you use lidar rcnn for regressing the attribute of the box, such as velocity? or simply refine the box coordinates?
Hi, @tianweiy Great work. For nuScenes, I find the two_stage.py file for the second-stage backbone, while didn't find corresponding network config file, data processing, evaluation code, and calling function, as well as introduction in readme file. Could you show how to use the second stage training and evalucation with your codebase? Thanks!