Open Anirbanbhk88 opened 2 years ago
HI @pangsu0613 congratulations for the good work. We also spoke over email before. I am doing my master thesis on 3D object detection using sensor fusion..Do you have any thoughts/suggestions regarding this...Since the kitti object tracking dataset([http://www.cvlibs.net/datasets/kitti/eval_tracking.php]) provides frames from previous time steps as sequences. I was thinking of training the 2D detector(Cascade-RCNN) and 3D detector(SECOND) with the kitti object tracking dataset. Finally I will put the generated 2D detections in the CLOCs source code '/d2_detection_data' path. And then add a LSTM layer in CLocs network and train CLOCs using the same dataset (while keeping the SECOND model as eval mode). I plann to do this as part of my thesis. Sorry for my ignorance. But could you give your feedback regarding whether this can be done.
Hello @Anirbanbhk88 , I think using multiple frames to augment the detections is a promissing direction. For KITTI, the object detection dataset is not organized in a continuous data sequence style, every frame is relatively independent. KITTI does provide the previous 3 frames for each frame in the object detection dataset, but these preceding frames are not labeled. Using the KITTI tracking dataset is a good option, the only issue is that I remember KITTI tracking dataset is much smaller compared to detection dataset. You could also have a look at nuScenes, Argoverse and Waymo dataset, these data are organized as data sequences (around 15-20 seconds each), and they are all well labeled. Regarding your implementation idea, I think it is good. For simplicity, maybe you could use the pretrained model to start instead of training using the tracking dataset.
Hi @pangsu0613 thanks for your feedback. As per my obvservations, The kitti tracking dataset has 21 sequences in the training set and 28 sequences in test set. each sequence have more than 100 frames. So overall in total there was around 8008 frames in training set(icluding all the sequences). Once I get some results with a public dataset, I plan to try with a private, company dataset. I have another question. based on your suggestion: 1) If I understood correclty, Should I take pretrained model of cascade-RCNN, fine tune them with image data in tracking dataset and then generate 2D detections(which will be used during clocs training)? Also should I take a pretrained model of second and fine tune it with the point cloud data in tracking dataset and then use it during clocs detection.
2)Which pretrained models for cascade-RCNN and SECOND I should take? Since I need to fine tune them further.
Hi @pangsu0613 sorry for asking again...could you please answer my previous questions?
Hello @Anirbanbhk88 , sorry for the late response.
@pangsu0613 Hi I did not quite get what did you mean by 'But noted that there is a potential issue, there are some overlaps between detection dataset and tracking dataset.' and what are the consequences due to this? Also refering to your answer 2. You have provided only the 2D sigmoid detections from the Cascade RCNN 2D model(not the pretrained model) and the pretained models for Second: second_cyclist_model.zip, second_model.zip, second_pedestrian_model.zip. I could not find any pretrained CascadeRCNN 2D model...Am I missing to find it
Regarding the first point. The authors of KITTI collected many data, I just refere it as 'raw data', but they only labeled part of it for detection dataset and tracking dataset. Both the detection dataset and tracking dataset are the subsets of the 'raw data'. So, some frames are identical in detection and tracking datasets. So, compared to the detection dataset, tracking dataset is not 'brand new' dataset. The potential issue could be, if there are too many overlaps, it could result in overfitting. For the second point. Sorry about that, I use this repo for cascad-rcnn: https://github.com/zhaoweicai/mscnn, you could find the pre-trained weights there, or you could train by yourself.
Hi @pangsu0613 Thanks for your earlier explanations. Since I am using kitti object tracking dataset to feed clocs with sequential data. I have generated 2D detections (in kitti format) from the 2D detector cascade RCNN. I have 2 doubts:
Hi @pangsu0613 thanks for the quick reply. regarding Question1: which areas of the CLOCs code do I need to make changes for supporting other class (like Van)? Also as per my understanding the point cloud files velodyne_reduced folder of default clocs implementation, are the point clouds for kitti object detection dataset. Now that I am training with kitti tracking dataset I have to replace those with the point clouds of tracking dataset...am I right?
Hi @pangsu0613 I have one question
Hello @Anirbanbhk88 ,
Thanks @pangsu0613 for the info
Hi @pangsu0613 during training the SECOND for cyclist and pedestrian which config files did you use. I see from the SECOND code(https://github.com/traveller59/second.pytorch/tree/v1.5.1) that they have config for car and all.fhd.config(which is config file for multiclass classification). I tried to build a config file for cyclist referring them and my evaluation results came quite low.
Hello @Anirbanbhk88 , we provide config files for pedestrian and cyclist (pedestrian.fhd.config and cyclist.fhd.config) under CLOCs/second/configs. I would recommend referecing to them for training SECOND.
Hello @Anirbanbhk88 ,
- I followed original SECOND setup and trained it for 50 epochs.
- For 2D detections, you don't need to generate them seperately for each class. Have a look at https://github.com/pangsu0613/CLOCs#pedestrian-and-cyclist-detection for detailed explanation.
Hi @pangsu0613 , Since I am trying CLOCs on KITTI tracking dataset to check how it performs for previous time sequence data. I initially trained the 2D Cascade RCNN detector (I trained fromMMDetection code). But each image, I got the 2D detections for Car, pedestrian and cyclist in a single detection file. I see the 2D detections you provided (cascade_rcnn_sigmoid_data.zip and mscnn_ped_cyc_trainval_sigmoid_data_scale_1000.zip) had seperate detection files for Car and Pedestrian, cyclist classes. I need some help regarding these doubts
Hello @Anirbanbhk88
@pangsu0613 Thanks for the clarifications. However for some 2D detections there are only Cars. Now when I am trying to train Clocs for Pedestrian class (by modifying the line 393 in voxelnet.py), no labels is read from such detection files with only Car detections. So there is no IOU match and hence the fusion network cannot give any valid output (it gives a tensor somewhat like [[-9999999, -999999] ... ] ). So I am getting no Avg Precision values at the end. This I noticed while debugging the code. What should I do in this situation?
Hi @pangsu0613 One small question. Is the CLOCS code setup to train with just batch_size=1? Because even if I am increasing the batch_size=8, it is returning just 1 IOU sparse tensor from the voxelnet.py class as per the code.
Hi @pangsu0613 One small question. Is the CLOCS code setup to train with just batch_size=1? Because even if I am increasing the batch_size=8, it is returning just 1 IOU sparse tensor from the voxelnet.py class as per the code.
It seems the code base only support batch_size ==1. Because in voxelnet forward function, only one image detection result is considered.
Hi @pangsu0613 I was trying some experiment to pass image and pointcloud data from previous time frames to clocs_SecCas. And modifying by adding some recurrent layers in clocs. I want to check whether this leads to improved detections accuracy. Will this be possible in the current architecture and do you have any suggestion about which dataset I can use for this kind of experiment.