Adding temporal data to clocs_SecCas

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 I was trying some experiment to pass image and pointcloud data from previous time frames to clocs_SecCas. And modifying by adding some recurrent layers in clocs. I want to check whether this leads to improved detections accuracy. Will this be possible in the current architecture and do you have any suggestion about which dataset I can use for this kind of experiment.

Anirbanbhk88 commented 2 years ago

HI @pangsu0613 congratulations for the good work. We also spoke over email before. I am doing my master thesis on 3D object detection using sensor fusion..Do you have any thoughts/suggestions regarding this...Since the kitti object tracking dataset([http://www.cvlibs.net/datasets/kitti/eval_tracking.php]) provides frames from previous time steps as sequences. I was thinking of training the 2D detector(Cascade-RCNN) and 3D detector(SECOND) with the kitti object tracking dataset. Finally I will put the generated 2D detections in the CLOCs source code '/d2_detection_data' path. And then add a LSTM layer in CLocs network and train CLOCs using the same dataset (while keeping the SECOND model as eval mode). I plann to do this as part of my thesis. Sorry for my ignorance. But could you give your feedback regarding whether this can be done.

pangsu0613 commented 2 years ago

Hello @Anirbanbhk88 , I think using multiple frames to augment the detections is a promissing direction. For KITTI, the object detection dataset is not organized in a continuous data sequence style, every frame is relatively independent. KITTI does provide the previous 3 frames for each frame in the object detection dataset, but these preceding frames are not labeled. Using the KITTI tracking dataset is a good option, the only issue is that I remember KITTI tracking dataset is much smaller compared to detection dataset. You could also have a look at nuScenes, Argoverse and Waymo dataset, these data are organized as data sequences (around 15-20 seconds each), and they are all well labeled. Regarding your implementation idea, I think it is good. For simplicity, maybe you could use the pretrained model to start instead of training using the tracking dataset.

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 thanks for your feedback. As per my obvservations, The kitti tracking dataset has 21 sequences in the training set and 28 sequences in test set. each sequence have more than 100 frames. So overall in total there was around 8008 frames in training set(icluding all the sequences). Once I get some results with a public dataset, I plan to try with a private, company dataset. I have another question. based on your suggestion: 1) If I understood correclty, Should I take pretrained model of cascade-RCNN, fine tune them with image data in tracking dataset and then generate 2D detections(which will be used during clocs training)? Also should I take a pretrained model of second and fine tune it with the point cloud data in tracking dataset and then use it during clocs detection.

2)Which pretrained models for cascade-RCNN and SECOND I should take? Since I need to fine tune them further.

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 sorry for asking again...could you please answer my previous questions?

pangsu0613 commented 2 years ago

Hello @Anirbanbhk88 , sorry for the late response.

yes, you could use pre-trained models for Cascade-RCNN and SECOND and finetuning them using the tracking training set. But noted that there is a potential issue, there are some overlaps between detection dataset and tracking dataset.
I think you could use the ones that I provided, which is trained on the KITTI detection training set (around 3700 frames).

Anirbanbhk88 commented 2 years ago

@pangsu0613 Hi I did not quite get what did you mean by 'But noted that there is a potential issue, there are some overlaps between detection dataset and tracking dataset.' and what are the consequences due to this? Also refering to your answer 2. You have provided only the 2D sigmoid detections from the Cascade RCNN 2D model(not the pretrained model) and the pretained models for Second: second_cyclist_model.zip, second_model.zip, second_pedestrian_model.zip. I could not find any pretrained CascadeRCNN 2D model...Am I missing to find it

pangsu0613 commented 2 years ago

Regarding the first point. The authors of KITTI collected many data, I just refere it as 'raw data', but they only labeled part of it for detection dataset and tracking dataset. Both the detection dataset and tracking dataset are the subsets of the 'raw data'. So, some frames are identical in detection and tracking datasets. So, compared to the detection dataset, tracking dataset is not 'brand new' dataset. The potential issue could be, if there are too many overlaps, it could result in overfitting. For the second point. Sorry about that, I use this repo for cascad-rcnn: https://github.com/zhaoweicai/mscnn, you could find the pre-trained weights there, or you could train by yourself.

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 Thanks for your earlier explanations. Since I am using kitti object tracking dataset to feed clocs with sequential data. I have generated 2D detections (in kitti format) from the 2D detector cascade RCNN. I have 2 doubts:

Apart from cars, cyclist and pedestrian, can I also detect Van class using clocs?
Now that I have the 2D detections, as per my understanding, I can use the pretrained Second 3D model checkpoint and pass the sequential data from kitti tracking dataset to train clocs. For this step am I supposed to send 3D point cloud data (for the kitti tracking dataset) to get predictions from SECOND model and further train clocs? And These 3D point cloud data needs to be placed in the 'velodyne_reduced' directory, right?
What is the use of the pickle files in kitti detection dataset(KITTI_DATASET_ROOT): kitti_dbinfos_train.pkl, kitti_infos_train.pkl, kitti_infos_test.pkl, kitti_infos_val.pkl, kitti_infos_trainval.pkl

pangsu0613 commented 2 years ago

Hello @Anirbanbhk88
clocs can be applied to other classes, you only need 3D and 2D detections of other classes.
Yes, you are supposed to have the 3D point cloud from tracking dataset. I would suggest have a look at https://github.com/traveller59/second.pytorch/tree/v1.5.1#prepare-dataset for how to prepare the dataset. This can explain what should be in 'velodyne_reducecd' directory, it will also explain why we need these pickle files you mentioned in the third question.

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 thanks for the quick reply. regarding Question1: which areas of the CLOCs code do I need to make changes for supporting other class (like Van)? Also as per my understanding the point cloud files velodyne_reduced folder of default clocs implementation, are the point clouds for kitti object detection dataset. Now that I am training with kitti tracking dataset I have to replace those with the point clouds of tracking dataset...am I right?

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 I have one question

I generated the kitti info files from https://github.com/traveller59/second.pytorch/tree/v1.5.1#prepare-dataset. Now I am trying to fine tune your SECOND models on the KITTI tracking dataset. However I am getting very less AP results. I dont know whether I am doing something wrong. How many epochs did you use
I generated the 2D detections by finetuning the Cascade RCNN model with Tracking dataset. The 2D detection files have detection for car, cyclist and pedestrian together. But I see in CLOCS the Car, Cyclist and pedestrian classes has been done separately. So Do I need to again generate those 2D detections seperately for each class? OR the CLOCS will read only that particular detections based on which class Im setting up The CLOCS?

pangsu0613 commented 2 years ago

Hello @Anirbanbhk88 ,

I followed original SECOND setup and trained it for 50 epochs.
For 2D detections, you don't need to generate them seperately for each class. Have a look at https://github.com/pangsu0613/CLOCs#pedestrian-and-cyclist-detection for detailed explanation.

Anirbanbhk88 commented 2 years ago

Thanks @pangsu0613 for the info

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 during training the SECOND for cyclist and pedestrian which config files did you use. I see from the SECOND code(https://github.com/traveller59/second.pytorch/tree/v1.5.1) that they have config for car and all.fhd.config(which is config file for multiclass classification). I tried to build a config file for cyclist referring them and my evaluation results came quite low.

pangsu0613 commented 2 years ago

Hello @Anirbanbhk88 , we provide config files for pedestrian and cyclist (pedestrian.fhd.config and cyclist.fhd.config) under CLOCs/second/configs. I would recommend referecing to them for training SECOND.

Anirbanbhk88 commented 2 years ago

Hello @Anirbanbhk88 ,

I followed original SECOND setup and trained it for 50 epochs.

For 2D detections, you don't need to generate them seperately for each class. Have a look at https://github.com/pangsu0613/CLOCs#pedestrian-and-cyclist-detection for detailed explanation.

Hi @pangsu0613 , Since I am trying CLOCs on KITTI tracking dataset to check how it performs for previous time sequence data. I initially trained the 2D Cascade RCNN detector (I trained fromMMDetection code). But each image, I got the 2D detections for Car, pedestrian and cyclist in a single detection file. I see the 2D detections you provided (cascade_rcnn_sigmoid_data.zip and mscnn_ped_cyc_trainval_sigmoid_data_scale_1000.zip) had seperate detection files for Car and Pedestrian, cyclist classes. I need some help regarding these doubts

since clocs code is designed to detect a single class at a time (car/no-car, pedestrian/no-pedestrian, cyclist/no-cyclist). If I understood correctly, that is the reason why you are using sigmoid results for each class for 2D detections, Each class is treated as a binary classification by the 2D detector. So Why is Pedestrian and cycle detections are in same files in mscnn_ped_cyc_trainval_sigmoid_data_scale_1000.zip?
Since I prepared the 2D detections for all the classes in a same detection file, my clocs is not training with this data. Maybe because for 3 classes, softmax scores are generated (by MMDetection/cascadeRCNN code) , instead of sigmoid. So Should I generate the detections for 3 classes seperately and also making sure sigmoid scores are produced.
If I try to build the Cascade RCNN 2D detections from this repo you provided: https://github.com/zhaoweicai/mscnn , what changes do I need to do to get sigmoid scores

pangsu0613 commented 2 years ago

Hello @Anirbanbhk88

In https://github.com/pangsu0613/CLOCs/blob/cad14fdc12392b9734d496e5d7782ae3ba200af5/second/pytorch/models/voxelnet.py#L393, the class is hardcoded, my apologies for the inconvenience and confusion. This is the main reason that one can have multiple class labels in the same file, because only the hard coded class will be read and processed.
As explained above, you don't have to generate 3 classes seperately, you only need to change the class name in voxelnet.py line 393.
Correct me if I am wrong, I remembered the repo provides sigmoid score by default, but somehow with scale 0 to 1000, if so, you only need to divide the output scores by 1000, or modify their matlab script, I remembered they have the scale factor in the matlab script.

Anirbanbhk88 commented 2 years ago

@pangsu0613 Thanks for the clarifications. However for some 2D detections there are only Cars. Now when I am trying to train Clocs for Pedestrian class (by modifying the line 393 in voxelnet.py), no labels is read from such detection files with only Car detections. So there is no IOU match and hence the fusion network cannot give any valid output (it gives a tensor somewhat like [[-9999999, -999999] ... ] ). So I am getting no Avg Precision values at the end. This I noticed while debugging the code. What should I do in this situation?

Anirbanbhk88 commented 2 years ago

Hi @pangsu0613 One small question. Is the CLOCS code setup to train with just batch_size=1? Because even if I am increasing the batch_size=8, it is returning just 1 IOU sparse tensor from the voxelnet.py class as per the code.

ihaohe commented 1 year ago

Hi @pangsu0613 One small question. Is the CLOCS code setup to train with just batch_size=1? Because even if I am increasing the batch_size=8, it is returning just 1 IOU sparse tensor from the voxelnet.py class as per the code.

It seems the code base only support batch_size ==1. Because in voxelnet forward function, only one image detection result is considered.

pangsu0613 / CLOCs

Adding temporal data to clocs_SecCas #66