Open xavidzo opened 3 years ago
yes I already changed the post_center_limit_range
accordingly, but still I get zero detections, I cannot believe the network is not able to generalize to a simple shift in the z coordinate
Have you experienced this issue before or maybe can you please do a test with some point clouds from waymo, translate the z coordinate upwards and then do inference on these translated point clouds?
It is still reasonable? Like we regress to an absolute z value for the box center height estimation and this absolute value is specific to lidar coordinate. We also don't get much data augmentation for z coordinate during the training. Though zero detections seem weird, and you can look the result before score thresholding and nms. Sorry I don't get time to check this recently.
@xavidzo Hello, I want to train my own dataset too, may i ask you how big is your own train dataset? thanks!
Hello @shallowdream-x, for now I trained only on 150 frames, i.e. pcd files... this is too little to make the network generalize good to unseen data, in my team we are going to label more frames to have in the end at least 1500
Hello @tianweiy, I trained CenterPoint on Kitti using some code from Det3D, after 100 epochs, this is the result I got on the validation set
CenterPoint performance on KITTI val split
2021-05-16 00:14:22,183 - INFO - Evaluation official:
car AP(Average Precision)@0.70, 0.70, 0.70:
bbox AP:89.03, 81.67, 81.39
bev AP:87.42, 80.24, 77.66
3d AP:70.29, 62.64, 61.73
aos AP:88.99, 81.28, 80.77
car AP(Average Precision)@0.70, 0.50, 0.50:
bbox AP:89.03, 81.67, 81.39
bev AP:89.87, 89.02, 88.11
3d AP:89.70, 88.47, 87.26
aos AP:88.99, 81.28, 80.77
I used the following config file:
import itertools
import logging
import numpy as np
from det3d.utils.config_tool import get_downsample_factor
tasks = [
dict(num_class=1, class_names=["Car"]),
]
class_names = list(itertools.chain(*[t["class_names"] for t in tasks]))
# training and testing settings
target_assigner = dict(
tasks=tasks,
)
pc_range = [0, -39.68, -3, 69.12, 39.68, 1]
voxel_size = [0.16, 0.16, 4.0]
grid_size = (np.asarray(pc_range)[3:] - np.asarray(pc_range)[:3])/ np.asarray(voxel_size)
# model settings
model = dict(
type="PointPillars",
pretrained=None,
reader=dict(
type="PillarFeatureNet",
num_filters=[64, 64],
num_input_features=4,
with_distance=False,
voxel_size=voxel_size,
pc_range=pc_range,
),
backbone=dict(type="PointPillarsScatter", ds_factor=1),
neck=dict(
type="RPN",
layer_nums=[3, 5, 5],
ds_layer_strides=[2, 2, 2],
ds_num_filters=[64, 128, 256],
us_layer_strides=[0.5, 1, 2],
us_num_filters=[128, 128, 128],
num_input_features=64,
logger=logging.getLogger("RPN"),
),
bbox_head=dict(
# type='RPNHead',
type="CenterHead",
in_channels=sum([128, 128, 128]),
tasks=tasks,
dataset='kitti',
weight=1,
code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv)
),
)
assigner = dict(
target_assigner=target_assigner,
out_size_factor=get_downsample_factor(model),
gaussian_overlap=0.1,
max_objs=500,
min_radius=2,
)
train_cfg = dict(assigner=assigner)
test_cfg = dict(
post_center_limit_range=[0, -40, -4.0, 70, 40, 2],
max_per_img=500,
nms=dict(
nms_pre_max_size=1000,
nms_post_max_size=83,
nms_iou_threshold=0.2,
),
score_threshold=0.1,
pc_range=[0, -39.68],
out_size_factor=get_downsample_factor(model),
voxel_size=[0.16, 0.16]
)
# dataset settings
dataset_type = "KittiDataset"
data_root = "data/kitti"
db_sampler = dict(
type="GT-AUG",
enable=True,
db_info_path="data/kitti/centerpoint_pkl/dbinfos_train.pkl",
sample_groups=[
dict(Car=15),
],
db_prep_steps=[
dict(
filter_by_min_num_points=dict(
Car=5,
)
),
dict(filter_by_difficulty=[-1],),
],
global_random_rotation_range_per_object=[0, 0],
rate=1.0,
)
train_preprocessor = dict(
mode="train",
shuffle_points=True,
gt_loc_noise=[0.25, 0.25, 0.25],
gt_rot_noise=[-0.15707963267, 0.15707963267],
global_rot_noise=[-0.78539816, 0.78539816],
global_scale_noise=[0.95, 1.05],
global_rot_per_obj_range=[0, 0],
global_trans_noise=[0.0, 0.0, 0.0],
remove_points_after_sample=True,
gt_drop_percentage=0.0,
gt_drop_max_keep_points=15,
remove_unknown_examples=False,
remove_environment=False,
db_sampler=db_sampler,
class_names=class_names,
)
val_preprocessor = dict(
mode="val",
shuffle_points=False,
remove_environment=False,
remove_unknown_examples=False,
)
voxel_generator = dict(
range = [0, -39.68, -3, 69.12, 39.68, 1],
voxel_size = [0.16, 0.16, 4.0],
max_points_in_voxel=100,
max_voxel_num=[30000, 60000],
)
train_pipeline = [
dict(type="LoadPointCloudFromFile", dataset=dataset_type),
dict(type="LoadPointCloudAnnotations", with_bbox=True),
dict(type="Preprocess", cfg=train_preprocessor),
dict(type="Voxelization", cfg=voxel_generator),
dict(type="AssignLabel", cfg=train_cfg["assigner"]),
dict(type="Reformat"),
]
test_pipeline = [
dict(type="LoadPointCloudFromFile", dataset=dataset_type),
dict(type="LoadPointCloudAnnotations", with_bbox=True),
dict(type="Preprocess", cfg=val_preprocessor),
dict(type="Voxelization", cfg=voxel_generator),
dict(type="AssignLabel", cfg=train_cfg["assigner"]),
dict(type="Reformat"),
]
train_anno = "/data/kitti/centerpoint_pkl/kitti_infos_train.pkl"
val_anno = "/data/kitti/centerpoint_pkl/kitti_infos_val.pkl"
test_anno = None
data = dict(
samples_per_gpu=3,
workers_per_gpu=6,
train=dict(
type=dataset_type,
root_path=data_root,
info_path=data_root + "/centerpoint_pkl/kitti_infos_train.pkl",
ann_file=train_anno,
class_names=class_names,
pipeline=train_pipeline,
),
val=dict(
type=dataset_type,
root_path=data_root,
info_path=data_root + "/centerpoint_pkl/kitti_infos_val.pkl",
ann_file=val_anno,
class_names=class_names,
pipeline=test_pipeline,
),
test=dict(
type=dataset_type,
root_path=data_root,
info_path=test_anno,
ann_file=test_anno,
class_names=class_names,
pipeline=test_pipeline,
),
)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# optimizer
optimizer = dict(
type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False,
)
lr_config = dict(
type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4,
)
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
interval=5,
hooks=[
dict(type="TextLoggerHook"),
# dict(type='TensorboardLoggerHook')
],
)
# yapf:enable
# runtime settings
total_epochs = 100
device_ids = range(8)
dist_params = dict(backend="nccl", init_method="env://")
log_level = "INFO"
work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3])
load_from = None
resume_from = None
workflow = [("train", 50), ("val", 1)]
Maybe you can tell me what changes I can do to the config file to increase the performance? Does my result match your previous experiment with centerpoint on kitti?
I saw in another issue you mention centerpoint should perform 78-79 (recall 11) Do you mean on the 3d task of the car class, moderate difficulty level?? on the validation or test set of kitti? https://github.com/tianweiy/CenterPoint/issues/78#issuecomment-826996605
Maybe you can tell me what changes I can do to the config file to increase the performance?
you also need to change code in addition to config. It is a bit complex so I suggest you to take a look at this repo https://github.com/tianweiy/CenterPoint-KITTI
Do you mean on the 3d task of the car class, moderate difficulty level?? on the validation or test set of kitti?
val
Hello @tianweiy, I trained CenterPoint on Kitti using some code from Det3D, after 100 epochs, this is the result I got on the validation set
CenterPoint performance on KITTI val split 2021-05-16 00:14:22,183 - INFO - Evaluation official: car AP(Average Precision)@0.70, 0.70, 0.70: bbox AP:89.03, 81.67, 81.39 bev AP:87.42, 80.24, 77.66 3d AP:70.29, 62.64, 61.73 aos AP:88.99, 81.28, 80.77 car AP(Average Precision)@0.70, 0.50, 0.50: bbox AP:89.03, 81.67, 81.39 bev AP:89.87, 89.02, 88.11 3d AP:89.70, 88.47, 87.26 aos AP:88.99, 81.28, 80.77
I used the following config file:
import itertools import logging import numpy as np from det3d.utils.config_tool import get_downsample_factor tasks = [ dict(num_class=1, class_names=["Car"]), ] class_names = list(itertools.chain(*[t["class_names"] for t in tasks])) # training and testing settings target_assigner = dict( tasks=tasks, ) pc_range = [0, -39.68, -3, 69.12, 39.68, 1] voxel_size = [0.16, 0.16, 4.0] grid_size = (np.asarray(pc_range)[3:] - np.asarray(pc_range)[:3])/ np.asarray(voxel_size) # model settings model = dict( type="PointPillars", pretrained=None, reader=dict( type="PillarFeatureNet", num_filters=[64, 64], num_input_features=4, with_distance=False, voxel_size=voxel_size, pc_range=pc_range, ), backbone=dict(type="PointPillarsScatter", ds_factor=1), neck=dict( type="RPN", layer_nums=[3, 5, 5], ds_layer_strides=[2, 2, 2], ds_num_filters=[64, 128, 256], us_layer_strides=[0.5, 1, 2], us_num_filters=[128, 128, 128], num_input_features=64, logger=logging.getLogger("RPN"), ), bbox_head=dict( # type='RPNHead', type="CenterHead", in_channels=sum([128, 128, 128]), tasks=tasks, dataset='kitti', weight=1, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) ), ) assigner = dict( target_assigner=target_assigner, out_size_factor=get_downsample_factor(model), gaussian_overlap=0.1, max_objs=500, min_radius=2, ) train_cfg = dict(assigner=assigner) test_cfg = dict( post_center_limit_range=[0, -40, -4.0, 70, 40, 2], max_per_img=500, nms=dict( nms_pre_max_size=1000, nms_post_max_size=83, nms_iou_threshold=0.2, ), score_threshold=0.1, pc_range=[0, -39.68], out_size_factor=get_downsample_factor(model), voxel_size=[0.16, 0.16] ) # dataset settings dataset_type = "KittiDataset" data_root = "data/kitti" db_sampler = dict( type="GT-AUG", enable=True, db_info_path="data/kitti/centerpoint_pkl/dbinfos_train.pkl", sample_groups=[ dict(Car=15), ], db_prep_steps=[ dict( filter_by_min_num_points=dict( Car=5, ) ), dict(filter_by_difficulty=[-1],), ], global_random_rotation_range_per_object=[0, 0], rate=1.0, ) train_preprocessor = dict( mode="train", shuffle_points=True, gt_loc_noise=[0.25, 0.25, 0.25], gt_rot_noise=[-0.15707963267, 0.15707963267], global_rot_noise=[-0.78539816, 0.78539816], global_scale_noise=[0.95, 1.05], global_rot_per_obj_range=[0, 0], global_trans_noise=[0.0, 0.0, 0.0], remove_points_after_sample=True, gt_drop_percentage=0.0, gt_drop_max_keep_points=15, remove_unknown_examples=False, remove_environment=False, db_sampler=db_sampler, class_names=class_names, ) val_preprocessor = dict( mode="val", shuffle_points=False, remove_environment=False, remove_unknown_examples=False, ) voxel_generator = dict( range = [0, -39.68, -3, 69.12, 39.68, 1], voxel_size = [0.16, 0.16, 4.0], max_points_in_voxel=100, max_voxel_num=[30000, 60000], ) train_pipeline = [ dict(type="LoadPointCloudFromFile", dataset=dataset_type), dict(type="LoadPointCloudAnnotations", with_bbox=True), dict(type="Preprocess", cfg=train_preprocessor), dict(type="Voxelization", cfg=voxel_generator), dict(type="AssignLabel", cfg=train_cfg["assigner"]), dict(type="Reformat"), ] test_pipeline = [ dict(type="LoadPointCloudFromFile", dataset=dataset_type), dict(type="LoadPointCloudAnnotations", with_bbox=True), dict(type="Preprocess", cfg=val_preprocessor), dict(type="Voxelization", cfg=voxel_generator), dict(type="AssignLabel", cfg=train_cfg["assigner"]), dict(type="Reformat"), ] train_anno = "/data/kitti/centerpoint_pkl/kitti_infos_train.pkl" val_anno = "/data/kitti/centerpoint_pkl/kitti_infos_val.pkl" test_anno = None data = dict( samples_per_gpu=3, workers_per_gpu=6, train=dict( type=dataset_type, root_path=data_root, info_path=data_root + "/centerpoint_pkl/kitti_infos_train.pkl", ann_file=train_anno, class_names=class_names, pipeline=train_pipeline, ), val=dict( type=dataset_type, root_path=data_root, info_path=data_root + "/centerpoint_pkl/kitti_infos_val.pkl", ann_file=val_anno, class_names=class_names, pipeline=test_pipeline, ), test=dict( type=dataset_type, root_path=data_root, info_path=test_anno, ann_file=test_anno, class_names=class_names, pipeline=test_pipeline, ), ) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) # optimizer optimizer = dict( type="adam", amsgrad=0.0, wd=0.01, fixed_wd=True, moving_average=False, ) lr_config = dict( type="one_cycle", lr_max=0.001, moms=[0.95, 0.85], div_factor=10.0, pct_start=0.4, ) checkpoint_config = dict(interval=1) # yapf:disable log_config = dict( interval=5, hooks=[ dict(type="TextLoggerHook"), # dict(type='TensorboardLoggerHook') ], ) # yapf:enable # runtime settings total_epochs = 100 device_ids = range(8) dist_params = dict(backend="nccl", init_method="env://") log_level = "INFO" work_dir = './work_dirs/{}/'.format(__file__[__file__.rfind('/') + 1:-3]) load_from = None resume_from = None workflow = [("train", 50), ("val", 1)]
Maybe you can tell me what changes I can do to the config file to increase the performance? Does my result match your previous experiment with centerpoint on kitti?
I saw in another issue you mention centerpoint should perform 78-79 (recall 11) Do you mean on the 3d task of the car class, moderate difficulty level?? on the validation or test set of kitti? #78 (comment)
I am trying to train a two-stage model, but somehow I get some error of mismatching dimensions: Traceback (most recent call last): File "./tools/train.py", line 137, in main() File "./tools/train.py", line 132, in main logger=logger, File "/home/CenterPoint/det3d/torchie/apis/train.py", line 327, in train_detector trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank) File "/home/CenterPoint/det3d/torchie/trainer/trainer.py", line 543, in run epoch_runner(data_loaders[i], self.epoch, kwargs) File "/home/CenterPoint/det3d/torchie/trainer/trainer.py", line 410, in train self.model, data_batch, train_mode=True, kwargs File "/home/CenterPoint/det3d/torchie/trainer/trainer.py", line 368, in batch_processor_inline losses = model(example, return_loss=True) File "/home/miniconda/envs/centerpoint/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/CenterPoint/det3d/models/detectors/two_stage.py", line 186, in forward batch_dict = self.roi_head(example, training=return_loss) File "/home/miniconda/envs/centerpoint/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/CenterPoint/det3d/models/roi_heads/roi_head.py", line 89, in forward shared_features = self.shared_fc_layer(pooled_features.view(batch_size_rcnn, -1, 1)) File "/home/miniconda/envs/centerpoint/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/miniconda/envs/centerpoint/lib/python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward input = module(input) File "/home/miniconda/envs/centerpoint/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/miniconda/envs/centerpoint/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 263, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/miniconda/envs/centerpoint/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 260, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Given groups=1, weight of size [256, 1920, 1], expected input[1024, 320, 1] to have 1920 channels, but got 320 channels instead Killing subprocess 316172
This is my model:
# model settings model = dict( type='TwoStageDetector', first_stage_cfg=dict( type="PointPillars", pretrained='work_dirs/nusc_centerpoint_pp_02voxel_two_pfn_10sweep_one_head_no_vel_040vox/epoch_600.pth', reader=dict( type="PillarFeatureNet", num_filters=[64, 64], num_input_features=4, with_distance=False, voxel_size=(0.40, 0.40, 4), pc_range=(-70.4, -70.4, -8.0, 70.4, 70.4, -4.0), ), backbone=dict(type="PointPillarsScatter", ds_factor=1), neck=dict( type="RPN", layer_nums=[3, 5, 5], ds_layer_strides=[2, 2, 2], ds_num_filters=[64, 128, 256], us_layer_strides=[0.5, 1, 2], us_num_filters=[128, 128, 128], num_input_features=64, logger=logging.getLogger("RPN"), ), bbox_head=dict( type="CenterHead", in_channels=128*3, tasks=tasks, dataset='providentia', weight=0.25, code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], common_heads={'reg': (2, 2), 'height': (1, 2), 'dim':(3, 2), 'rot':(2, 2)}, # (output_channel, num_conv) ), ), second_stage_modules=[ dict( type="BEVFeatureExtractor", pc_start=[-70.4, -70.4], voxel_size=[0.40, 0.40], out_stride=1 ) ], roi_head=dict( type="RoIHead", input_channels=128*3*5, model_cfg=dict( CLASS_AGNOSTIC=True, SHARED_FC=[256, 256], CLS_FC=[256, 256], REG_FC=[256, 256], DP_RATIO=0.3, TARGET_CONFIG=dict( ROI_PER_IMAGE=128, FG_RATIO=0.5, SAMPLE_ROI_BY_EACH_CLASS=True, CLS_SCORE_TYPE='roi_iou', CLS_FG_THRESH=0.75, CLS_BG_THRESH=0.25, CLS_BG_THRESH_LO=0.1, HARD_BG_RATIO=0.8, REG_FG_THRESH=0.55 ), LOSS_CONFIG=dict( CLS_LOSS='BinaryCrossEntropy', REG_LOSS='L1', LOSS_WEIGHTS={ 'rcnn_cls_weight': 1.0, 'rcnn_reg_weight': 1.0, 'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] } ) ), code_size=7 ), NMS_POST_MAXSIZE=500, num_point=5, freeze=True )
Could you please spot what might be wrong?
Hi xavidzo! I saw your great work. I also trying to training the network with my custom labeled data. But there are a lot of troubles. Could you sharing the code to training custom dataset?
Best Regards
Hello @tndus5497, I am sorry I can't share the exact code now because of the right issues with my university, as tianweiy said, the easiest way to start is to copy the API for loading the waymo dataset, then adapt it to your needs and format of your labeled data. For example, I used the pypcd library to read the point clouds directly in pcd ascii format, (this is slow though) I know it can be overwhelming to change many things in the code, you should use also many print statements to follow what's going behind the scenes and you also have to add some code in the pipeline scripts for loading.py, formating.py, preprocess.py in "CenterPoint/det3d/datasets/pipelines/" If you have like more specific questions, then I could help you of course
Hi @xavidzo
Now I have another problem, maybe you know what could be the reason: When I do inference with the original pcd files of my dataset, the network predicts very good bounding box detections However, when I transform my pcd files (I only translate the z coordinate of each point in the point cloud to change the range from [-8, 0] to [0, +8] ). Then, I adjust the pc_range and voxel_range accordingly in the config file of the network to match the changes in the z coordinate, but I get zero predictions, no results at all on these transformed pcd files. Do you have an idea of what I am missing?
Were you ever able to solve the issue with transformed pcds? I've had the same issue as yours, data in world frame of reference gives poor detection results but works fine in sensor/ego frame even after adjusting for point cloud range. Was wondering if you had any findings that could be helpful to me
Hello @naman1-gupta, I came to the conclusion that if the network was trained with a certain point cloud range, then the test pcds you use for inference should have around the same point cloud range as well because the network learned to predict bounding boxes from training data in that particular point cloud range. The generalization of neural networks for 3D object detection is not as straightforward as for the 2D case. Thus, what I ended up doing was to manually change the input data range when I loaded my test pcd files, say I change the z coordinate from the original (0, +8) to (-8, 0) in order to match the range in the config file of training point clouds that were also in z (-8, 0). When I get the inference results predicted by centerpoint then I rescale manually the output range of the bboxes again back to (0, +8). This worked for us very good and the visualization in Rvz looked consistent. Hope my suggestion can be useful for you.
Hello @xavidzo . Can you be more specific about training custom datasets? I have the pcd file and the corresponding label file (txt format, including x, y, z.dx, dy, dz, yaw). Thanks a lot.
do you solve the custom datasets? I have the same problem!
Hi, I think it's really amazing the work you've done, congrats!! For my thesis project, I would like to use your network to analyze point cloud data, so right now what I have is only a series of .pcd files that are recordings of vehicles on the highway. A lidar sensor was mounted on a bridge, so the recordings are from a static point of view. Could you please give some hints how to prepare these data as an input to your network to perform the 3d object detection? Do I have to prepare database tables in .json format according to the nuscenes specifications? Is it a must? What other steps are required?
Thank you in advance