Closed kx-Z closed 2 years ago
The first thing is that GroupFree
is designed for indoor scenes, so you will hardly reach any reasonable results on outdoor KITTI
dataset.
Second, you may provide the full config and traceback of your experiment and error. If it is connected with stacking of different number of points per scene you can try adding dict(type='IndoorPointSample', num_points=20000)
to your config.
Thank you for your reply,maybe I didn't read carefully enough,but I didn't find the model specifically designed for indoor scenes in relevant papers, such as votenet. Which work mentioned this problem?
The configuration is as follows:
`base = [ '../base/datasets/kitti-3d-3class.py', '../base/models/groupfree3d.py', '../base/schedules/schedule_3x.py', '../base/default_runtime.py' ]
model = dict( backbone=dict( type='PointNet2SASSG', in_channels=3, num_points=(2048, 1024, 512, 256), radius=(0.2, 0.4, 0.8, 1.2), num_samples=(64, 32, 16, 16), sa_channels=((128, 128, 256), (256, 256, 512), (256, 256, 512), (256, 256, 512)), fp_channels=((512, 512), (512, 288)), norm_cfg=dict(type='BN2d'), sa_cfg=dict( type='PointSAModule', pool_mod='max', use_xyz=True, normalize_xyz=True)), bbox_head=dict( num_classes=3, num_decoder_layers=6, size_cls_agnostic=False, bbox_coder=dict( type='GroupFree3DBBoxCoder', num_sizes=3, num_dir_bins=1, with_rot=False, size_cls_agnostic=False, mean_sizes=[[3.9, 1.6, 1.56], [0.8, 0.6, 1.73], [1.76, 0.6, 1.73]]), sampling_objectness_loss=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=8.0), objectness_loss=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), center_loss=dict( type='SmoothL1Loss', beta=0.04, reduction='sum', loss_weight=10.0), dir_class_loss=dict( type='CrossEntropyLoss', reduction='sum', loss_weight=1.0), dir_res_loss=dict( type='SmoothL1Loss', reduction='sum', loss_weight=10.0), size_class_loss=dict( type='CrossEntropyLoss', reduction='sum', loss_weight=1.0), size_res_loss=dict( type='SmoothL1Loss', beta=1.0 / 9.0, reduction='sum', loss_weight=10.0 / 9.0), semantic_loss=dict( type='CrossEntropyLoss', reduction='sum', loss_weight=1.0)), test_cfg=dict( sample_mod='kps', nms_thr=0.25, score_thr=0.0, per_class_proposal=True, prediction_stages='last_three'))
dataset_type = 'KittiDataset' data_root = 'data/kitti/' class_names = ['Pedestrian', 'Cyclist', 'Car'] point_cloud_range = [0, -40, -3, 70.4, 40, 1] input_modality = dict(use_lidar=True, use_camera=False) db_sampler = dict( data_root=data_root, info_path=data_root + 'kitti_dbinfos_train.pkl', rate=1.0, prepare=dict( filter_by_difficulty=[-1], filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)), classes=class_names, sample_groups=dict(Car=12, Pedestrian=6, Cyclist=6))
file_client_args = dict(backend='disk')
train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=[0, 1, 2],
file_client_args=file_client_args),
dict(
type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True,
file_client_args=file_client_args),
dict(type='ObjectSample', db_sampler=db_sampler),
#dict(type='IndoorPointSample', num_points=20000),
#dict(type='PointSample', num_points=50000),
dict(
type='ObjectNoise',
num_try=100,
translation_std=[1.0, 1.0, 0.5],
global_rot_range=[0.0, 0.0],
rot_range=[-0.78539816, 0.78539816]),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='GlobalRotScaleTrans',
rot_range=[-0.78539816, 0.78539816],
scale_ratio_range=[0.95, 1.05]),
dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
dict(type='PointShuffle'),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
] test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=[0, 1, 2],
file_client_args=file_client_args),
dict(
type='MultiScaleFlipAug3D',
img_scale=(1333, 800),
pts_scale_ratio=1,
flip=False,
transforms=[
dict(
type='GlobalRotScaleTrans',
rot_range=[0, 0],
scale_ratio_range=[1., 1.],
translation_std=[0, 0, 0]),
dict(type='RandomFlip3D'),
dict(
type='PointsRangeFilter', point_cloud_range=point_cloud_range),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
])
]
eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=[0, 1, 2],
file_client_args=file_client_args),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['points'])
]
data = dict( samples_per_gpu=1, workers_per_gpu=2, train=dict( type='RepeatDataset', times=2, dataset=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_train.pkl', split='training', pts_prefix='velodyne_reduced', pipeline=train_pipeline, modality=input_modality, classes=class_names, test_mode=False,
# and box_type_3d='Depth' in sunrgbd and scannet dataset.
box_type_3d='LiDAR')),
val=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_val.pkl',
split='training',
pts_prefix='velodyne_reduced',
pipeline=test_pipeline,
modality=input_modality,
classes=class_names,
test_mode=True,
box_type_3d='LiDAR'),
test=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'kitti_infos_val.pkl',
split='training',
pts_prefix='velodyne_reduced',
pipeline=test_pipeline,
modality=input_modality,
classes=class_names,
test_mode=True,
box_type_3d='LiDAR'))
evaluation = dict(interval=1, pipeline=eval_pipeline)
lr = 0.002 optimizer = dict( lr=lr, weight_decay=0.0005, paramwise_cfg=dict( custom_keys={ 'bbox_head.decoder_layers': dict(lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_self_posembeds': dict( lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_cross_posembeds': dict( lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_query_proj': dict(lr_mult=0.1, decay_mult=1.0), 'bbox_head.decoder_key_proj': dict(lr_mult=0.1, decay_mult=1.0) }))
optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2)) lr_config = dict(policy='step', warmup=None, step=[56, 68])
runner = dict(type='EpochBasedRunner', max_epochs=80) checkpoint_config = dict(interval=1, max_keep_ckpts=10)`
error:
Original Traceback (most recent call last):
File "/home/zhuyi/anaconda3/envs/mmd2/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/zhuyi/anaconda3/envs/mmd2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/zhuyi/anaconda3/envs/mmd2/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
Thank you for your reply,maybe I didn't read carefully enough,but I didn't find the model specifically designed for indoor scenes in relevant papers, such as votenet. Which work mentioned this problem?
As I understand it is not about a single work mentioning this problem. You can see that 2 sets of papers introducing indoor (VoteNet, ImVoteNet, GroupFree3D, 3D-MPA, HGNet, BRNet, 3DETR, MLCVNet, VENet, ...) and outdoor (CenterPoint, PartA2, SECOND, MVXNet, 3DSSD, ...) are almost not intersecting. There are a couple of reasons why this division is as it is. The main reason is that outdoor scenes can be good enough approximated by their BEV projection, so almost all outdoor detectors have 2D head. And for indoor scenes it is not an option.
Thank you very much for your help.
RuntimeError: stack expects each tensor to be equal size, but got [19280, 4] at entry 0 and [21555, 4] at entry 1.
The reason for this problem seems to be that the read num points are different. How do I modify the code?