How to apply PointPillar on my own dataset for 3d object detection?

shangjie-li / pointpillars

Implementation of PointPillars in PyTorch for KITTI 3D Object Detetcion

4 stars 3 forks source link

How to apply PointPillar on my own dataset for 3d object detection? #2

Open aatefi2 opened 8 months ago

aatefi2 commented 8 months ago

Hi Shangjie,

I have pointcloud data from Ouster lidar. I want to use Pointpillar on my dataset for 3d object detection. I was able to successfully apply your code on Kitti dataset (pointcloud data, labels, 2d image, and calibration file). I just have pointcloud data (.bin/kitti format) with labels (.txt/kitti format). How can I use your code just for one class on my own dataset?

Thank you for your help in advance.

Best, Abbas

shangjie-li commented 8 months ago

First of all, I'm glad you used my repository. I haven't tried running PointPillars with a custom dataset, but I'm sure it can be done, just by changing a few implementation details in the code.

The code of the entire repository consists of four parts: data processing, network inference, post-processing (training/evaluation), and visualization. I think it is enough that you mainly modify the data processing part, which is implemented in data/kitti_dataset.py.

But before you do that, there are some preconditions to check. First, your Ouster lidar is a 360-degree rotating lidar, right? Second, you said you only have point cloud files and label files, so I guess your label format is not the standard kitti format. Because the standard kitti format includes: cls_type, truncation, occlusion, alpha, box2d_x1, box2d_y1, box2d_x2, box2d_y2, h, w, l, loc_x, loc_y, loc_z, rotation_y. Many of these fields can only be labeled from the camera's perspective.

If you want to use the code from this repository, the content you label must have at least: x, y, z, l, w, h, rotation (in the lidar coordinates). Note: The 3D information in the standard kitti format is labeled in the camera coordinates, but in my code, it is uniformly converted to the lidar coordinates for processing.

aatefi2 commented 8 months ago

shangjie-li,

Thank you so much for your prompt reply. Yes, The ouster lidar is 360 degree. Does it create any issue? I used labelCloud for labeling my pointclouds with untransformed kitti format. But, you are right; I think I just have x, y, z, l, w, h, and rotation (in lidar coordinate system). Should I comment all lines related to image, image shape, calibration, and plane road in kitti_dataset.py?

This is a label example: [class 0 0 0 0 0 0 0 h w l x y z rotation] Tractor 0 0 0 0 0 0 0 1.61719908 0.45174701 1.38766037 27.29647474 -2.78948783 3.62838917 1.40818386 Should I remove the zeros from the labels to use your code?

Should I edit config.yaml file as well? I think I should change the items below: 1) class name => I just have one class 2) point cloud range: [xmin,ymin,zmin,xmax,ymax,zmax] => I think I should get these values from the min and max values in each direction from point cloud file 3) Anchor sizes: [l,w,h] => average l,w,h of all 3d bounding boxes for all objects 4) Anchor rotation: Average the last number (rotation) on my labels 5) Anchor bottom heights: average the z values of all objects Any suggestion?

Sorry for lots of questions; I just started 3D object detection and I appreciate your help to apply your code on my dataset.

shangjie-li commented 8 months ago

You're welcome. I'm happy to discuss it with you.

The 360-degree lidar is a good fit because the Velodyne (used in the kitti dataset) is a 360-degree lidar.

Both label formats are OK (Format A: class, x, y, z, l, w, h, rotation; Format B: class, 0, 0, 0, 0, 0, 0, 0, 0, h, w, l, x, y, z, rotation). I suggest you use Format A because it's simpler. To do this, you need to modify utils/object3d_kitti.py.

The config.yaml needs to be modified, and the changes you mentioned (class names, point cloud range, anchor settings) are all correct. In addition, the configuration of data augmentation (DATA_AUGMENTOR) is also related to the dataset. If for now you don't care about the detection performance of the network, you can delete the code about data augmentation.

I recommend you read the whole data/kitti_dataset.py, so you will understand all the data processing flow, and you will know what needs to be modified. In the function KittiDataset.getitem, the return value data_dict must include the following fields: frame_id, gt_boxes, points, voxels, voxel_coords and voxel_num_points. The rest fields can be deleted. Of course, you can delete all lines about image, image shape, calibration and plane road in kitti_dataset.py.

Note: voxels, voxel_coords and voxel_num_points are used for network inference; gt_boxes are used to calculate losses; frame_id, points are used for visualization. Specifically, the gt_box is defined as x, y, z, l, w, h, heading (in the lidar coordinates) and class_id. That's exactly what you labeled.

In the end, it is necessary to tell you that you almost have to rewrite data/kitti_dataset.py. I think it will take you 3 to 5 days to finish it. You need to re-consider whether to use this repository for applying.

aatefi2 commented 8 months ago

shangjie-li,

Thank you again for your useful information. I spent one month to find a code and I was able to successfully use your code on the kitti dataset example. I think it is worth to apply your code on my own dataset. I will edit the kitti_dataset.py and run your code again. But after modifying the code, I really appreciate it if you can look at the edited kitti_dataset.py and let me know about your suggestion/feedback before applying the train.py.

In kitti_dataset.py, what is the frame_id? is it related to pointcloud or 2d image?

aatefi2 commented 8 months ago

shangjie-li,

I used Format A for the labels. I edited kitti_dataset.py (attached zip file). Also, I kept data augmenter in the yaml file. After running train.py, It seems the training process was started, but it shows an error (attached png file).

Any suggestion?

Thank you, Abbas

kitti_dataset.zip Screenshot from 2023-10-27 15-25-18

shangjie-li commented 8 months ago

It's OK to look at the edited code for you, but it's not convenient for me during my office hours, so I will do it after work.

The frame_id refers to a file name and has nothing to do with images or point clouds. For example, if your data (point cloud file/label file) is 000000.bin/txt 0000001.bin/txt..., then the frame_id is 000000 000001...

From the error report, the problem is in line 147 of database_sampler.py. When executing "sampled_boxes = np.stack([x['box3d_lidar'] for x in sampled_dict], axis=0).astype(np.float32)", the sampled_dict is empty. However, np.stack needs at least one array to stack. You can print the sampled_dict and its box3d_lidar field for further debugging.

shangjie-li commented 8 months ago

I checked your kitti_datset.py. I think there are a few things you need to know:

Before training or evaluating the network, you need to use the function KittiDataset.create_kitti_infos to generate kitti_infos_train.pkl and kitti_infos_val.pkl. These two pkl files save some necessary information such as ground truths. Since your label format is different from the kitti format, you need to regenerate these two pkl files.
To regenerate these two pkl files, you need to carefully modify the function KittiDataset.get_infos. You need to modify the fields in "annotations". Some fields cannot be read from your label files, such as "truncated", "occluded", "alpha", "bbox", etc.
As I mentioned earlier, you need to modify object3d_kitti.py. Your "Object3d" should include only the fields you use. The fucntion KittiDataset.get_infos is closely related to the class "Object3d".
You can print the values of some variables when running train.py to verify the correctness of your changes. For example, you can print the fields of "info" in the function KittiDataset.getitem (after executing "info = copy.deepcopy(self.kitti_infos[index])") to see if they match your expectations.

aatefi2 commented 8 months ago

Dear shangjie-li,

Thank you for your prompt response. I completely understand that office hours can be busy. Your feedback is valuable to me whenever it's convenient for you. I have the questions below:

1) The code successfully made the *.pkl and ground truth files without any error. Do you think the information of the created files are incomplete and this might cause issues for the training process?

2&3) I kept the values of these fields ( "truncated", "occluded", "alpha", "bbox", etc.) to zero. For this reason, I did not comment these items in the the code. Do you think I have to delete them from the labels and then edit the code (comment these fields in the code)?

I know all questions will be answered by try and error and debugging process and I will continue to do this with your valuable feedback. I'll be away from my office for the next two days, but I'll aim to make the code edits once I'm back in the office. I will let you know about the results.

Best, Abbas

shangjie-li commented 8 months ago

It is OK not to delete these fields ("truncated", "occluded", "alpha", "bbox", etc.) and fill with zeros.
Now that you have successfully generated *.pkl and ground truth files, your function KittiDataset.get_infos is probably OK.
The error in the picture you sent earlier is related to data augmentation. There are two ways to deal with it:

a) Ignore the data augmentation code for now. Specifically, delete "data_dict = self.data_augmentor.forward(data_dict=data_dict) if self.data_augmentor is not None else data_dict" in the function KittiDataset.prepare_data.

b) Modify the data augmentation code. There are four data augmentation methods in the code: gt_sampling, random_world_flip, random_world_rotation, random_world_scaling. The entry point is in data/augmentor/data_augment.py, and the configuration is in DATA_AUGMENTOR in data/config.yaml. The error you encountered occurs in gt_sampling. However, since I can't see your data/config.yaml, I can't tell why it went wrong.

If you think your data processing is OK, You can run "python dataset_player.py --show_boxes" (with or without --data_augmentation) to view point clouds and 3D bounding boxes in the Open3d visualization window.

aatefi2 commented 8 months ago

Dear shangjie-li,

Thank you for your suggestion. I have commented the data augmentation line in the code. I could run python dataset_player.py --show_boxes for both training and testing dataset successfully.

After running the training code, I got this error (attached file): RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 42 but got size 44 for tensor number 2 in the list. Screenshot from 2023-10-30 09-17-10 Any idea to solve this issue?

shangjie-li commented 8 months ago

According to the error report, when using torch.cat to concatenate multiple tensors, the dimensions are inconsistent.

I'm not quite sure what the cause of the error is. But in your config.yaml, POINT_CLOUD_RANGE is not set properly. In the X, Y, and Z directions, POINT_CLOUD_RANGE should be an integer multiple of VOXEL_SIZE.

Maybe you can change POINT_CLOUD_RANGE back to [0, -39.68, -3, 69.12, 39.68, 1] and see if it works?

aatefi2 commented 8 months ago

Thank you so much! 1) After changing the point cloud range based on your suggestion, I was able to successfully run the training.py. Although, the training process was successful; however, it could not complete the evaluation process at the end of the training process and shows an error.

Screenshot from 2023-10-30 13-48-14

I commented the lines below in kitti_dataset.py, and it completed the evaluation process (File.zip attached file includes the updated code and config file).

            #pred_dict['alpha'] = -np.arctan2(-pred_boxes[:, 1], pred_boxes[:, 0]) + pred_boxes_camera[:, 6]
            #pred_dict['bbox'] = pred_boxes_img

Screenshot from 2023-10-30 14-14-05 Files.zip

Do you think I did it correctly? I think the evaluation metrics are zeros because the number of training dataset is not enough and the network could not trained adequately. I used 50 point clouds (50 objects) with 10 epochs. I need to label more data to increase the training dataset, run the code, and then evaluate its performance. What do you think?

2) I have the questions below:

2.1) Regarding point cloud range and voxel size, I got the values of point cloud range= [xmin,ymin,zmin,xmax,ymax,zmax] from my point cloud dataset. If I have point cloud range from my dataset as [-13.31, -3.28, -0.05, 6.96, 10.19, 2.30] and keep the voxel size as [0.16,0.16,4], what would be the best values for point cloud range in the config.yaml for my dataset?

2.2) I have determined the values below in config.yaml file based on my dataset: Anchor sizes: [l,w,h] => average [l,w,h] of all 3d bounding boxes for all objects (using label.txt files) Anchor rotation: Average the last number (rotation) in all label.txt files Anchor bottom heights: average the z values of all objects (the number before the rotation value in my label files) Did I determine them correctly?

2.3) In the kitti_dataset.py, I have changed this line: gt_boxes_lidar = np.concatenate([loc_lidar, l, w, h, -(np.pi / 2 + rots[..., np.newaxis])], axis=1) to this: gt_boxes_lidar = np.concatenate([loc_lidar, l, w, h, rots[..., np.newaxis]], axis=1) Do you think this change is correct?

Sorry for long questions and Thank you!

shangjie-li commented 8 months ago

You're welcome. I'd love to read your message and make some suggestions. I'm not sure I can answer all your questions, but I'll do my best.

I checked your code and I think there are still some bugs. So, before you label any more data, you need to confirm a few things:

1.1) Now that you are able to train the network, you can check the network predictions in the Open3d visualization window. Specifically, run "python demo.py --ckpt=output/ckpt/checkpoint_epoch_10.pth" (with or without --show_gt_boxes). If there are too many predictions in the window, you can adjust the --score_thresh value to filter out some low confidence results.

1.2) I'm not sure if you have converted the predictions correctly. To confirm this, you can add the argument --save_to_file when training the netwrok or simply run "python test.py --ckpt=output/ckpt/checkpoint_epoch_10.pth --save_to_file". The predictions of each frame data will be saved to output/eval/epoch_10/final_result/data (in kitti format). You can compare the output txt file with your label file to see if it fits your expectations (both should be: class, 0, 0, 0, 0, 0, 0, 0, h, w, l, x, y, z, rotation). The code that creates these txt files is in the function KittiDataset.generate_prediction_dicts.

Answer some of your questions:

2.1) I have not tried any other POINT_CLOUD_RANGE. Maybe you can try to [34.56, 39.68, 3, 34.56, 39.68, 1] or [17.28, 19.84, 3, 17.28, 19.84, 1] to see if the code can run normally.

2.2) The way you calculated anchor sizes and anchor bottom heights is correct. I think it's better to set the anchor rotations to [0, 1.57].

2.3) If you run "python dataset_player.py --show_boxes" and the 3D bounding boxes look as you expect, then your changes are correct.

aatefi2 commented 8 months ago

Dear shangjie-li,

Thank you for your feedback.

Please find my answers below:

1.1) I ran python demo.py --ckpt=output/ckpt/checkpoint_epoch_10.pth --show_gt_boxes , and adjusted the score_thresh=0.5 in the code. It shows the predictions (definitely far from the object) for testing dataset. I should note that the gt_boxes are shown for some objects and not for other. FYI: All bounding boxes are shown for the training dataset when I run python dataset_player.py --training --data_augmentation --show_boxes.

1.2) The number of items in the .txt files is 16 Not 15. For all .text files, all items are zeroes except first, second, third, and the last items (attached file). 000001.txt

2.1) I kept the previous values for point cloud range and voxel size as [0, -39.68, -3, 69.12, 39.68, 1] and [0.16,0.16,4]. 2.2) As you suggested, I updated anchor rotations to [0, 1.57]. 2.3) I ran python dataset_player.py --show_boxes, the bounding boxes are shown for some objects and not for other.

New update about the evaluation process: When I run python train.py --batch_size=2, the training process is completed but the evaluation process shows progress but never ended (attached file). Any idea about this issue?

Screenshot from 2023-10-31 13-24-55

shangjie-li commented 8 months ago

There are many possible reasons why the predictions are far from the objects. Maybe there's a bug in the code, maybe it's a lack of training.
It's definitely not OK that "the gt_boxes are shown for some objects and not for other". Possible causes are: a) There are mistakes in the label files (some objects are not labeled); b) There are bugs in the data processing part of the code.
Yes, the number of items in the *.txt files should be 16, and the last one is the score (confidence) of the prediction. It's not OK that "all items are zeroes except first, second, third, and the last items". At least the h, w, l, x, y, z, rotation of the 3D bounding box cannot be zero. So, the function KittiDataset.generate_prediction_dicts must have a bug.
Sorry for the typo before. What I wanted to say about POINT_CLOUD_RANGE was [-34.56, -39.68, -3, 34.56, 39.68, 1] or [-17.28, -19.84, -3, 17.28, 19.84, 1], I don't know if you have tried them.
About "the evaluation process shows progress but never ended," I don't know why you're in this situation, it never happened to me. But you should be able to avoid this by reducing the value of --max_waiting_mins when running python train.py --batch_size=2. For example, set --max_waiting_mins to 0.

aatefi2 commented 8 months ago

Dear shangjie-li,

Thank you for your suggestion. I increased my dataset to 120 labeled pointclouds. I used 51 pointcouds for validation dataset. I edited the config.yaml and used PointcloudRange1= [0, -39.68, -3, 69.12, 39.68, 1], PointcloudRange2= [-34.56, -39.68, -3, 34.56, 39.68, 1], and PointcloudRange3=[-17.28, -19.84, -3, 17.28, 19.84, 1] with 100 epochs and threshold to 0.5. I ran python demo.py --ckpt=output/ckpt/checkpoint_epoch_100.pth --show_gt_boxes. I assume the green box is for predictions and the red one for gt_boxes. Please find the results below (first number is for PointcloudRange1 and the second for PointcloudRange2) :

1) No gt_boxes, No predictions: 13, 17 2) gt boxes, good predictions: 21, 23 3) gt_boxes, off predictions: 0,0 4) gt_boxes, No predictions: 13,11 5) No gt_boxes, good predictions: 0,0 6) No gt_boxes, off predictions: 0,4

It can be seen that the PointcloudRange2 shows better results regarding the predictions, but PointcloudRange1 demonstrates better performance for showing the gt_boxes. But, both pointcloud ranges still have issue to show the gt_boxes. Do you think the pointcloud range effects on showing the gt_boxes? Any other ideas?

For both pointcloud ranges, I still get zero values for the evaluation metrics. I also ran python test.py --ckpt=output/ckpt/checkpoint_epoch_100.pth --save_to_file. I can see some *.txt files are empty and others still have the previous format (all items are zeroes except first, second, third, and the last items). Any idea to solve this issue? Screenshot from 2023-11-02 07-50-41

For PointcloudRange3, I got the error below. Any suggestion? Screenshot from 2023-11-02 10-10-08

I have the questins below: 1) Is the validation dataset used in the training process to tune the hyperparameters or just for calculation of the evaluation metrics? 2) I used the procedure below to run the trained model on a random pointcloud data (000124.bin) and it predicted the bounding box successfully:

Add 000124.txt (including calss name and zero values) to label_2 folder
Add 000124.bin to velodyn folder
Add 000124 to ImageSets/val.txt
run python -m data.kitti_dataset create_kitti_infos data/config.yaml
run python demo.py --ckpt=output/ckpt/checkpoint_epoch_100.pth --sample_idx=000124

Is there a code to get the predictions without following this procedure?

Thank you!

shangjie-li commented 8 months ago

It's correct that the green box is for predictions and the red one for gt_boxes.
I don't think the POINT_CLOUD_RANGE will affect the showing of gt_boxes. I think you should debug the process of showing gt_boxes to see what's wrong.
About "all items are zeroes except first, second, third, and the last items," as I said before, you should look at the implementation of the function KittiDataset.generate_prediction_dicts. I don't think this function is written correctly.
Maybe the current code only supports fixed-size feature map input, that is [432, 496, 4]. The calculation method is as follows:

X direction: 69.12 / 0.16 = 432, Y direction: 39.68 * 2 / 0.16 = 496, Z direction: 1 - (-3) / 4 = 1.

As to why other sizes of input are not supported, you need to look at the code in the directory layer/, I forget the details.

The validation dataset used in the training process is just for calculation of the evaluation metrics, not for tuning the hyperparameters.
In the current code, if you want to run the network on a random point cloud data, you do have to follow the process you mentioned. You might want an interface like this: python demo.py --ckpt=output/ckpt/checkpoint_epoch_100.pth --point_cloud=xxx.bin. However, the current code does not support such an interface, you will have to implement it yourself.