Closed Zoeeeing closed 2 years ago
Thanks for using SST. No, we have not tried SST on nuScenes. But If you share your config and detailed results, maybe we can help you.
Thanks! The modified model is as follows:
voxel_size=(0.25, 0.25, 8),
window_shape = (16, 16, 1),
point_cloud_range=[-50, -50, -5, 50, 50, 3],
model = dict(
type='DynamicVoxelNet',
voxel_layer=dict(
voxel_size=(0.25, 0.25, 8),
max_num_points=-1,
point_cloud_range=[-50, -50, -5, 50, 50, 3],
max_voxels=(-1, -1)),
voxel_encoder=dict(
type='DynamicVFE',
in_channels=4,
feat_channels=[64, 128],
with_distance=False,
voxel_size=(0.25, 0.25, 8),
with_cluster_center=True,
with_voxel_center=True,
point_cloud_range=[-50, -50, -5, 50, 50, 3],
norm_cfg=dict(type='naiveSyncBN1d', eps=0.001, momentum=0.01)),
middle_encoder=dict(
type='SSTInputLayerV2',
window_shape=(16, 16, 1),
sparse_shape=(400, 400, 1),
shuffle_voxels=True,
debug=True,
drop_info=({
0: {
'max_tokens': 100,
'drop_range': (0, 100)
},
1: {
'max_tokens': 200,
'drop_range': (100, 200)
},
2: {
'max_tokens': 250,
'drop_range': (200, 10000)
}
}, {
0: {
'max_tokens': 100,
'drop_range': (0, 100)
},
1: {
'max_tokens': 200,
'drop_range': (100, 200)
},
2: {
'max_tokens': 256,
'drop_range': (200, 10000)
}
}),
pos_temperature=10000,
normalize_pos=False),
backbone=dict(
type='SSTv2',
d_model=[128, 128, 128, 128, 128, 128],
nhead=[8, 8, 8, 8, 8, 8],
num_blocks=6,
dim_feedforward=[256, 256, 256, 256, 256, 256],
output_shape=[400, 400],
num_attached_conv=3,
conv_kwargs=[
dict(kernel_size=3, dilation=1, padding=1, stride=1),
dict(kernel_size=3, dilation=1, padding=1, stride=1),
dict(kernel_size=3, dilation=2, padding=2, stride=1)
],
conv_in_channel=128,
conv_out_channel=128,
debug=True),
neck=dict(
type='SECONDFPN',
norm_cfg=dict(type='naiveSyncBN2d', eps=0.001, momentum=0.01),
in_channels=[128],
upsample_strides=[1],
out_channels=[384]),
bbox_head=dict(
type='Anchor3DHead',
num_classes=10,
in_channels=384,
feat_channels=384,
use_direction_classifier=True,
anchor_generator=dict(
type='AlignedAnchor3DRangeGenerator',
ranges=[[-49.6, -49.6, -1.80032795, 49.6, 49.6, -1.80032795],
[-49.6, -49.6, -1.74440365, 49.6, 49.6, -1.74440365],
[-49.6, -49.6, -1.68526504, 49.6, 49.6, -1.68526504],
[-49.6, -49.6, -1.67339111, 49.6, 49.6, -1.67339111],
[-49.6, -49.6, -1.61785072, 49.6, 49.6, -1.61785072],
[-49.6, -49.6, -1.80984986, 49.6, 49.6, -1.80984986],
[-49.6, -49.6, -1.763965, 49.6, 49.6, -1.763965]],
sizes=[[1.95017717, 4.60718145, 1.72270761],
[2.4560939, 6.73778078, 2.73004906],
[2.87427237, 12.01320693, 3.81509561],
[0.60058911, 1.68452161, 1.27192197],
[0.66344886, 0.7256437, 1.75748069],
[0.39694519, 0.40359262, 1.06232151],
[2.49008838, 0.48578221, 0.98297065]],
custom_values=[0, 0],
rotations=[0, 1.57],
reshape_out=True),
assigner_per_size=False,
diff_rad_by_sin=True,
dir_offset=0.7854,
dir_limit_offset=0,
bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder', code_size=9),
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(
type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0),
loss_dir=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
iou_calculator=dict(type='BboxOverlapsNearest3D'),
pos_iou_thr=0.6,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
allowed_border=0,
code_weight=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
pos_weight=-1,
debug=False),
test_cfg=dict(
use_rotate_nms=True,
nms_across_levels=False,
nms_pre=1000,
nms_thr=0.2,
score_thr=0.05,
min_bbox_size=0,
max_num=500))
After training for 24 epochs, i got the detailed results as follows.
pts_bbox_NuScenes/car_AP_dist_0.5: 0.4701, pts_bbox_NuScenes/car_AP_dist_1.0: 0.6067, pts_bbox_NuScenes/car_AP_dist_2.0: 0.6618, pts_bbox_NuScenes/car_AP_dist_4.0: 0.6832, pts_bbox_NuScenes/car_trans_err: 0.2372, pts_bbox_NuScenes/car_scale_err: 0.1477, pts_bbox_NuScenes/car_orient_err: 0.1317, pts_bbox_NuScenes/car_vel_err: 0.2814, pts_bbox_NuScenes/car_attr_err: 0.2252, pts_bbox_NuScenes/mATE: 0.4841, pts_bbox_NuScenes/mASE: 0.2709, pts_bbox_NuScenes/mAOE: 0.5280, pts_bbox_NuScenes/mAVE: 0.3700, pts_bbox_NuScenes/mAAE: 0.1962, pts_bbox_NuScenes/truck_AP_dist_0.5: 0.0624, pts_bbox_NuScenes/truck_AP_dist_1.0: 0.2224, pts_bbox_NuScenes/truck_AP_dist_2.0: 0.3657, pts_bbox_NuScenes/truck_AP_dist_4.0: 0.3988, pts_bbox_NuScenes/truck_trans_err: 0.5955, pts_bbox_NuScenes/truck_scale_err: 0.2285, pts_bbox_NuScenes/truck_orient_err: 0.2259, pts_bbox_NuScenes/truck_vel_err: 0.2660, pts_bbox_NuScenes/truck_attr_err: 0.2360, pts_bbox_NuScenes/trailer_AP_dist_0.5: 0.0000, pts_bbox_NuScenes/trailer_AP_dist_1.0: 0.0000, pts_bbox_NuScenes/trailer_AP_dist_2.0: 0.0073, pts_bbox_NuScenes/trailer_AP_dist_4.0: 0.0857, pts_bbox_NuScenes/trailer_trans_err: 0.9790, pts_bbox_NuScenes/trailer_scale_err: 0.2405, pts_bbox_NuScenes/trailer_orient_err: 0.9358, pts_bbox_NuScenes/trailer_vel_err: 0.3954, pts_bbox_NuScenes/trailer_attr_err: 0.1308, pts_bbox_NuScenes/bus_AP_dist_0.5: 0.0105, pts_bbox_NuScenes/bus_AP_dist_1.0: 0.1396, pts_bbox_NuScenes/bus_AP_dist_2.0: 0.3895, pts_bbox_NuScenes/bus_AP_dist_4.0: 0.4736, pts_bbox_NuScenes/bus_trans_err: 0.7881, pts_bbox_NuScenes/bus_scale_err: 0.1895, pts_bbox_NuScenes/bus_orient_err: 0.1455, pts_bbox_NuScenes/bus_vel_err: 0.6699, pts_bbox_NuScenes/bus_attr_err: 0.1602, pts_bbox_NuScenes/construction_vehicle_AP_dist_0.5: 0.0000, pts_bbox_NuScenes/construction_vehicle_AP_dist_1.0: 0.0036, pts_bbox_NuScenes/construction_vehicle_AP_dist_2.0: 0.0457, pts_bbox_NuScenes/construction_vehicle_AP_dist_4.0: 0.0629, pts_bbox_NuScenes/construction_vehicle_trans_err: 0.9470, pts_bbox_NuScenes/construction_vehicle_scale_err: 0.5084, pts_bbox_NuScenes/construction_vehicle_orient_err: 1.3642, pts_bbox_NuScenes/construction_vehicle_vel_err: 0.1244, pts_bbox_NuScenes/construction_vehicle_attr_err: 0.4645, pts_bbox_NuScenes/bicycle_AP_dist_0.5: 0.0264, pts_bbox_NuScenes/bicycle_AP_dist_1.0: 0.0287, pts_bbox_NuScenes/bicycle_AP_dist_2.0: 0.0290, pts_bbox_NuScenes/bicycle_AP_dist_4.0: 0.0298, pts_bbox_NuScenes/bicycle_trans_err: 0.1875, pts_bbox_NuScenes/bicycle_scale_err: 0.2586, pts_bbox_NuScenes/bicycle_orient_err: 0.8511, pts_bbox_NuScenes/bicycle_vel_err: 0.3377, pts_bbox_NuScenes/bicycle_attr_err: 0.0047, pts_bbox_NuScenes/motorcycle_AP_dist_0.5: 0.1205, pts_bbox_NuScenes/motorcycle_AP_dist_1.0: 0.1384, pts_bbox_NuScenes/motorcycle_AP_dist_2.0: 0.1415, pts_bbox_NuScenes/motorcycle_AP_dist_4.0: 0.1458, pts_bbox_NuScenes/motorcycle_trans_err: 0.2381, pts_bbox_NuScenes/motorcycle_scale_err: 0.2787, pts_bbox_NuScenes/motorcycle_orient_err: 0.7527, pts_bbox_NuScenes/motorcycle_vel_err: 0.6352, pts_bbox_NuScenes/motorcycle_attr_err: 0.3060, pts_bbox_NuScenes/pedestrian_AP_dist_0.5: 0.5656, pts_bbox_NuScenes/pedestrian_AP_dist_1.0: 0.5758, pts_bbox_NuScenes/pedestrian_AP_dist_2.0: 0.5854, pts_bbox_NuScenes/pedestrian_AP_dist_4.0: 0.5960, pts_bbox_NuScenes/pedestrian_trans_err: 0.1403, pts_bbox_NuScenes/pedestrian_scale_err: 0.2611, pts_bbox_NuScenes/pedestrian_orient_err: 0.3074, pts_bbox_NuScenes/pedestrian_vel_err: 0.2499, pts_bbox_NuScenes/pedestrian_attr_err: 0.0425, pts_bbox_NuScenes/traffic_cone_AP_dist_0.5: 0.0727, pts_bbox_NuScenes/traffic_cone_AP_dist_1.0: 0.0775, pts_bbox_NuScenes/traffic_cone_AP_dist_2.0: 0.0849, pts_bbox_NuScenes/traffic_cone_AP_dist_4.0: 0.1073, pts_bbox_NuScenes/traffic_cone_trans_err: 0.1638, pts_bbox_NuScenes/traffic_cone_scale_err: 0.3195, pts_bbox_NuScenes/traffic_cone_orient_err: nan, pts_bbox_NuScenes/traffic_cone_vel_err: nan, pts_bbox_NuScenes/traffic_cone_attr_err: nan, pts_bbox_NuScenes/barrier_AP_dist_0.5: 0.0680, pts_bbox_NuScenes/barrier_AP_dist_1.0: 0.2386, pts_bbox_NuScenes/barrier_AP_dist_2.0: 0.3307, pts_bbox_NuScenes/barrier_AP_dist_4.0: 0.3615, pts_bbox_NuScenes/barrier_trans_err: 0.5643, pts_bbox_NuScenes/barrier_scale_err: 0.2763, pts_bbox_NuScenes/barrier_orient_err: 0.0374, pts_bbox_NuScenes/barrier_vel_err: nan, pts_bbox_NuScenes/barrier_attr_err: nan, pts_bbox_NuScenes/NDS: 0.4278, pts_bbox_NuScenes/mAP: 0.2253
Your config looks fine to me. I am sorry that I do not have enough information to explain the poor results. We will try to run SST on nuScenes, but I can not provide the precise schedule for now. My suggestion is to debug each component (backbone/head/) using a small datasize. For example, changing the anchor head to the center head to check if the head module is correct.
OK. I will debug the component and check the result when you run on nuScenes. Thanks for your work.
Hi, do you have more recent results on nuscenes? @Zoeeeing
@Devoe-97 Sorry I can not get some better results.
@Abyssaledge did you try to run experiments on nuScenes dataset ? As nuScenes has less (5 times) samples than Waymo, does that have any effect on training from scratch to get such poor results on nuScenes ? (Because transformers are data hungry!!!) What do you think about it?
@gopi231091 I have not run the experiments on nuScenes yet. To my knowledge, SST is not that data-hungry. It has a better performance than PointPillars baseline with 20% training data on Waymo. However, its performance in nuScenes might a little worse than the SOTAs because the Pillar-based models show inferior performance in nuScenes, which is observed by many researchers.
Hi, have you experimented on some other outdoor datasets such as nuscenes? As i used SST to train on nuScenes dataset, the results i got were not ideal. I just modified the hyperparameters about the voxel size and replaced the head .I would like to ask whether there is a problem. Thanks!