Closed min-zhang closed 1 year ago
Guys please talk in English, or talk privately. Thank you
Hello, my test result is as follows, epoch is 80, INFO Car AP@0.70, 0.70, 0.70: bbox AP:36.2793, 39.2191, 32.0413 bev AP:42.8272, 47.2691, 40.7447 3d AP:17.6040, 20.9249, 19.2227 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.70, 0.70: bbox AP:33.1363, 37.1293, 30.5799 bev AP:40.0895, 45.7915, 37.5233 3d AP:11.6239, 17.0281, 13.8623 aos AP:30.95, 36.06, 29.66 Car AP@0.70, 0.50, 0.50: bbox AP:36.2793, 39.2191, 32.0413 bev AP:63.7014, 60.0651, 51.6864 3d AP:43.5480, 46.8279, 40.3347 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.50, 0.50: bbox AP:33.1363, 37.1293, 30.5799 bev AP:62.9937, 59.0316, 51.2108 3d AP:41.3002, 44.1800, 37.1804 aos AP:30.95, 36.06, 29.66 Pedestrian AP@0.50, 0.50, 0.50: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.0567, 0.0942, 0.0942 3d AP:0.0301, 0.0501, 0.0501 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.50, 0.50: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0156, 0.0259, 0.0259 3d AP:0.0083, 0.0138, 0.0138 aos AP:0.03, 0.04, 0.04 Pedestrian AP@0.50, 0.25, 0.25: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.1384, 0.3047, 0.2964 3d AP:0.1028, 0.2191, 0.2133 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.25, 0.25: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0680, 0.1462, 0.1439 3d AP:0.0540, 0.1018, 0.1002 aos AP:0.03, 0.04, 0.04 Cyclist AP@0.50, 0.50, 0.50: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.50, 0.50: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP@0.50, 0.25, 0.25: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.1310, 0.1310, 0.1310 3d AP:0.1198, 0.1198, 0.1198 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.25, 0.25: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0728, 0.0728, 0.0728 3d AP:0.0662, 0.0662, 0.0662 aos AP:0.01, 0.01, 0.01 the detection accuracy of people and bicycles is very low, I modified it by your https://github.com/tudelft-iv/view-of-delft-dataset/blob/main/PP-Radar.md, and the modified file is as follows: radar_5frames_as_kitti_dataset.yaml `DATASET: 'KittiDataset' DATA_PATH: '../data/radar_5frames'
POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2]
DATA_SPLIT: { 'train': train, 'test': val }
INFO_PATH: { 'train': [kitti_infos_train.pkl], 'test': [kitti_infos_val.pkl], }
FOV_POINTS_ONLY: True
DATA_AUGMENTOR: DISABLE_AUG_LIST: ['placeholder'] AUG_CONFIG_LIST:
NAME: random_world_flip ALONG_AXIS_LIST: ['x']
NAME: random_world_scaling WORLD_SCALE_RANGE: [0.95, 1.05]
POINT_FEATURE_ENCODING: { encoding_type: absolute_coordinates_encoding, used_feature_list: ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'], src_feature_list: ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'], }
DATA_PROCESSOR:
NAME: mask_points_and_boxes_outside_range REMOVE_OUTSIDE_BOXES: True
NAME: shuffle_points SHUFFLE_ENABLED: { 'train': True, 'test': False }
NAME: transform_points_to_voxels
VOXEL_SIZE: [0.16, 0.16, 5]
MAX_POINTS_PER_VOXEL: 10
MAX_NUMBER_OF_VOXELS: {
'train': 16000,
'test': 40000
}
pointpillarsradar.yaml
CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']
DATA_CONFIG: _BASECONFIG: cfgs/dataset_configs/radar_5frames_as_kitti_dataset.yaml POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2] DATA_PROCESSOR:
NAME: mask_points_and_boxes_outside_range REMOVE_OUTSIDE_BOXES: True
NAME: shuffle_points SHUFFLE_ENABLED: { 'train': True, 'test': False }
NAME: transform_points_to_voxels VOXEL_SIZE: [0.16, 0.16, 5] MAX_POINTS_PER_VOXEL: 10 MAX_NUMBER_OF_VOXELS: { 'train': 16000, 'test': 40000 } DATA_AUGMENTOR: DISABLE_AUG_LIST: ['random_world_rotation', 'gt_sampling'] AUG_CONFIG_LIST:
- NAME: random_world_flip
ALONG_AXIS_LIST: ['x']
- NAME: random_world_scaling
WORLD_SCALE_RANGE: [0.95, 1.05]
MODEL: NAME: PointPillar
VFE:
NAME: Radar7PillarVFE
USE_XYZ: True
USE_RCS: True
USE_VR: True
USE_VR_COMP: True
USE_TIME: True
USE_NORM: True
USE_ELEVATION: True
USE_DISTANCE: False
NUM_FILTERS: [64]
MAP_TO_BEV:
NAME: PointPillarScatter
NUM_BEV_FEATURES: 64
BACKBONE_2D:
NAME: BaseBEVBackbone
LAYER_NUMS: [3, 5, 5]
LAYER_STRIDES: [2, 2, 2]
NUM_FILTERS: [64, 128, 256]
UPSAMPLE_STRIDES: [1, 2, 4]
NUM_UPSAMPLE_FILTERS: [128, 128, 128]
DENSE_HEAD:
NAME: AnchorHeadSingle
CLASS_AGNOSTIC: False
USE_DIRECTION_CLASSIFIER: True
DIR_OFFSET: 0.78539
DIR_LIMIT_OFFSET: 0.0
NUM_DIR_BINS: 2
ANCHOR_GENERATOR_CONFIG: [
{
'class_name': 'Car',
'anchor_sizes': [[3.9, 1.6, 1.56]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-1.78],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.6,
'unmatched_threshold': 0.45
},
{
'class_name': 'Pedestrian',
'anchor_sizes': [[0.8, 0.6, 1.73]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-0.6],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.5,
'unmatched_threshold': 0.35
},
{
'class_name': 'Cyclist',
'anchor_sizes': [[1.76, 0.6, 1.73]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-0.6],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.5,
'unmatched_threshold': 0.35
}
]
TARGET_ASSIGNER_CONFIG:
NAME: AxisAlignedTargetAssigner
POS_FRACTION: -1.0
SAMPLE_SIZE: 512
NORM_BY_NUM_EXAMPLES: False
MATCH_HEIGHT: False
BOX_CODER: ResidualCoder
LOSS_CONFIG:
LOSS_WEIGHTS: {
'cls_weight': 1.0,
'loc_weight': 2.0,
'dir_weight': 0.2,
'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
}
POST_PROCESSING:
RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
SCORE_THRESH: 0.1
OUTPUT_RAW_SCORE: False
EVAL_METRIC: kitti
NMS_CONFIG:
MULTI_CLASSES_NMS: False
NMS_TYPE: nms_gpu
NMS_THRESH: 0.01
NMS_PRE_MAXSIZE: 4096
NMS_POST_MAXSIZE: 500
OPTIMIZATION: BATCH_SIZE_PER_GPU: 16 NUM_EPOCHS: 80
OPTIMIZER: adam_onecycle
LR: 0.003
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9
MOMS: [0.95, 0.85]
PCT_START: 0.4
DIV_FACTOR: 10
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001
LR_WARMUP: False
WARMUP_EPOCH: 1
GRAD_NORM_CLIP: 10
`
pillar_vfe.py `import torch import torch.nn as nn import torch.nn.functional as F
from .vfe_template import VFETemplate
class PFNLayer(nn.Module): def init(self, in_channels, out_channels, use_norm=True, last_layer=False): super().init()
self.last_vfe = last_layer
self.use_norm = use_norm
if not self.last_vfe:
out_channels = out_channels // 2
if self.use_norm:
self.linear = nn.Linear(in_channels, out_channels, bias=False)
self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
else:
self.linear = nn.Linear(in_channels, out_channels, bias=True)
self.part = 50000
def forward(self, inputs):
if inputs.shape[0] > self.part:
# nn.Linear performs randomly when batch size is too large
num_parts = inputs.shape[0] // self.part
part_linear_out = [self.linear(inputs[num_part*self.part:(num_part+1)*self.part])
for num_part in range(num_parts+1)]
x = torch.cat(part_linear_out, dim=0)
else:
x = self.linear(inputs)
torch.backends.cudnn.enabled = False
x = self.norm(x.permute(0, 2, 1)).permute(0, 2, 1) if self.use_norm else x
torch.backends.cudnn.enabled = True
x = F.relu(x)
x_max = torch.max(x, dim=1, keepdim=True)[0]
if self.last_vfe:
return x_max
else:
x_repeat = x_max.repeat(1, inputs.shape[1], 1)
x_concatenated = torch.cat([x, x_repeat], dim=2)
return x_concatenated
class PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs): super().init(model_cfg=model_cfg)
self.use_norm = self.model_cfg.USE_NORM
self.with_distance = self.model_cfg.WITH_DISTANCE
self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ
num_point_features += 6 if self.use_absolute_xyz else 3
if self.with_distance:
num_point_features += 1
self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i]
out_filters = num_filters[i + 1]
pfn_layers.append(
PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
)
self.pfn_layers = nn.ModuleList(pfn_layers)
self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]
def get_output_feature_dim(self):
return self.num_filters[-1]
def get_paddings_indicator(self, actual_num, max_num, axis=0):
actual_num = torch.unsqueeze(actual_num, axis + 1)
max_num_shape = [1] * len(actual_num.shape)
max_num_shape[axis + 1] = -1
max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
paddings_indicator = actual_num.int() > max_num
return paddings_indicator
def forward(self, batch_dict, **kwargs):
voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict['voxel_coords']
points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
f_cluster = voxel_features[:, :, :3] - points_mean
f_center = torch.zeros_like(voxel_features[:, :, :3])
f_center[:, :, 0] = voxel_features[:, :, 0] - (coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, 1] - (coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
f_center[:, :, 2] = voxel_features[:, :, 2] - (coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)
if self.use_absolute_xyz:
features = [voxel_features, f_cluster, f_center]
else:
features = [voxel_features[..., 3:], f_cluster, f_center]
if self.with_distance:
points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
features.append(points_dist)
features = torch.cat(features, dim=-1)
voxel_count = features.shape[1]
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
features *= mask
for pfn in self.pfn_layers:
features = pfn(features)
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict
class Radar7PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range): super().init(model_cfg=model_cfg)
num_point_features = 0
self.use_norm = self.model_cfg.USE_NORM # whether to use batchnorm in the PFNLayer
self.use_xyz = self.model_cfg.USE_XYZ
self.with_distance = self.model_cfg.USE_DISTANCE
self.selected_indexes = []
## check if config has the correct params, if not, throw exception
radar_config_params = ["USE_RCS", "USE_VR", "USE_VR_COMP", "USE_TIME", "USE_ELEVATION"]
if all(hasattr(self.model_cfg, attr) for attr in radar_config_params):
self.use_RCS = self.model_cfg.USE_RCS
self.use_vr = self.model_cfg.USE_VR
self.use_vr_comp = self.model_cfg.USE_VR_COMP
self.use_time = self.model_cfg.USE_TIME
self.use_elevation = self.model_cfg.USE_ELEVATION
else:
raise Exception("config does not have the right parameters, please use a radar config")
self.available_features = ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time']
num_point_features += 6 # center_x, center_y, center_z, mean_x, mean_y, mean_z, time, we need 6 new
self.x_ind = self.available_features.index('x')
self.y_ind = self.available_features.index('y')
self.z_ind = self.available_features.index('z')
self.rcs_ind = self.available_features.index('rcs')
self.vr_ind = self.available_features.index('v_r')
self.vr_comp_ind = self.available_features.index('v_r_comp')
self.time_ind = self.available_features.index('time')
if self.use_xyz: # if x y z coordinates are used, add 3 channels and save the indexes
num_point_features += 3 # x, y, z
self.selected_indexes.extend((self.x_ind, self.y_ind, self.z_ind)) # adding x y z channels to the indexes
if self.use_RCS: # add 1 if RCS is used and save the indexes
num_point_features += 1
self.selected_indexes.append(self.rcs_ind) # adding RCS channels to the indexes
if self.use_vr: # add 1 if vr is used and save the indexes. Note, we use compensated vr!
num_point_features += 1
self.selected_indexes.append(self.vr_ind) # adding v_r_comp channels to the indexes
if self.use_vr_comp: # add 1 if vr is used (as proxy for sensor cue) and save the indexes
num_point_features += 1
self.selected_indexes.append(self.vr_comp_ind)
if self.use_time: # add 1 if time is used and save the indexes
num_point_features += 1
self.selected_indexes.append(self.time_ind) # adding time channel to the indexes
### LOGGING USED FEATURES ###
print("number of point features used: " + str(num_point_features))
print("6 of these are 2 * (x y z) coordinates realtive to mean and center of pillars")
print(str(len(self.selected_indexes)) + " are selected original features: ")
for k in self.selected_indexes:
print(str(k) + ": " + self.available_features[k])
self.selected_indexes = torch.LongTensor(self.selected_indexes) # turning used indexes into Tensor
self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i]
out_filters = num_filters[i + 1]
pfn_layers.append(
PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
)
self.pfn_layers = nn.ModuleList(pfn_layers)
## saving size of the voxel
self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]
## saving offsets, start of point cloud in x, y, z + half a voxel, e.g. in y it starts around -39 m
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]
def get_output_feature_dim(self):
return self.num_filters[-1] # number of outputs in last output channel
def get_paddings_indicator(self, actual_num, max_num, axis=0):
actual_num = torch.unsqueeze(actual_num, axis + 1)
max_num_shape = [1] * len(actual_num.shape)
max_num_shape[axis + 1] = -1
max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
paddings_indicator = actual_num.int() > max_num
return paddings_indicator
def forward(self, batch_dict, **kwargs):
## coordinate system notes
# x is pointing forward, y is left right, z is up down
# spconv returns voxel_coords as [batch_idx, z_idx, y_idx, x_idx], that is why coords is indexed backwards
voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict[
'voxel_coords']
if not self.use_elevation: # if we ignore elevation (z) and v_z
voxel_features[:, :, self.z_ind] = 0 # set z to zero before doing anything
orig_xyz = voxel_features[:, :, :self.z_ind + 1] # selecting x y z
# calculate mean of points in pillars for x y z and save the offset from the mean
# Note: they do not take the mean directly, as each pillar is filled up with 0-s. Instead, they sum and divide by num of points
points_mean = orig_xyz.sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
f_cluster = orig_xyz - points_mean # offset from cluster mean
# calculate center for each pillar and save points' offset from the center. voxel_coordinate * voxel size + offset should be the center of pillar (coords are indexed backwards)
f_center = torch.zeros_like(orig_xyz)
f_center[:, :, 0] = voxel_features[:, :, self.x_ind] - (
coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, self.y_ind] - (
coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
f_center[:, :, 2] = voxel_features[:, :, self.z_ind] - (
coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)
voxel_features = voxel_features[:, :, self.selected_indexes] # filtering for used features
features = [voxel_features, f_cluster, f_center]
if self.with_distance: # if with_distance is true, include range to the points as well
points_dist = torch.norm(orig_xyz, 2, 2, keepdim=True) # first 2: L2 norm second 2: along 2. dim
features.append(points_dist)
## finishing up the feature extraction with correct shape and masking
features = torch.cat(features, dim=-1)
voxel_count = features.shape[1]
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
features *= mask
for pfn in self.pfn_layers:
features = pfn(features)
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict
`
/pcdet/models/backbones_3d/vfe/init.py `from .mean_vfe import MeanVFE from .pillar_vfe import PillarVFE,Radar7PillarVFE from .dynamic_mean_vfe import DynamicMeanVFE from .dynamic_pillar_vfe import DynamicPillarVFE, DynamicPillarVFESimple2D from .image_vfe import ImageVFE from .vfe_template import VFETemplate
all = {
'VFETemplate': VFETemplate,
'MeanVFE': MeanVFE,
'PillarVFE': PillarVFE,
'Radar7PillarVFE': Radar7PillarVFE,
'ImageVFE': ImageVFE,
'DynMeanVFE': DynamicMeanVFE,
'DynPillarVFE': DynamicPillarVFE,
'DynamicPillarVFESimple2D': DynamicPillarVFESimple2D
}
pcdet/datasets/kitti/kitti_dataset.py
def get_lidar(self, idx):
lidar_file = self.root_split_path / 'velodyne' / ('%s.bin' % idx)
assert lidar_file.exists()
number_of_channels = 7 # ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time']
points = np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)
means = [0, 0, 0, 0, 0, 0, 0] # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'
stds = [1, 1, 1, 1, 1, 1, 1] # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'
# we then norm the channels
points = (points - means) / stds
return points
# return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)
# return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4)`
Can you help me see what is the reason
Guys please talk in English, or talk privately. Thank you Hello, my test result is as follows, epoch is 80, INFO Car AP@0.70, 0.70, 0.70: bbox AP:36.2793, 39.2191, 32.0413 bev AP:42.8272, 47.2691, 40.7447 3d AP:17.6040, 20.9249, 19.2227 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.70, 0.70: bbox AP:33.1363, 37.1293, 30.5799 bev AP:40.0895, 45.7915, 37.5233 3d AP:11.6239, 17.0281, 13.8623 aos AP:30.95, 36.06, 29.66 Car AP@0.70, 0.50, 0.50: bbox AP:36.2793, 39.2191, 32.0413 bev AP:63.7014, 60.0651, 51.6864 3d AP:43.5480, 46.8279, 40.3347 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.50, 0.50: bbox AP:33.1363, 37.1293, 30.5799 bev AP:62.9937, 59.0316, 51.2108 3d AP:41.3002, 44.1800, 37.1804 aos AP:30.95, 36.06, 29.66 Pedestrian AP@0.50, 0.50, 0.50: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.0567, 0.0942, 0.0942 3d AP:0.0301, 0.0501, 0.0501 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.50, 0.50: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0156, 0.0259, 0.0259 3d AP:0.0083, 0.0138, 0.0138 aos AP:0.03, 0.04, 0.04 Pedestrian AP@0.50, 0.25, 0.25: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.1384, 0.3047, 0.2964 3d AP:0.1028, 0.2191, 0.2133 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.25, 0.25: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0680, 0.1462, 0.1439 3d AP:0.0540, 0.1018, 0.1002 aos AP:0.03, 0.04, 0.04 Cyclist AP@0.50, 0.50, 0.50: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.50, 0.50: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP@0.50, 0.25, 0.25: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.1310, 0.1310, 0.1310 3d AP:0.1198, 0.1198, 0.1198 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.25, 0.25: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0728, 0.0728, 0.0728 3d AP:0.0662, 0.0662, 0.0662 aos AP:0.01, 0.01, 0.01 the detection accuracy of people and bicycles is very low, I modified it by your https://github.com/tudelft-iv/view-of-delft-dataset/blob/main/PP-Radar.md, and the modified file is as follows: radar_5frames_as_kitti_dataset.yaml `DATASET: 'KittiDataset' DATA_PATH: '../data/radar_5frames'
POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2]
DATA_SPLIT: { 'train': train, 'test': val }
INFO_PATH: { 'train': [kitti_infos_train.pkl], 'test': [kitti_infos_val.pkl], }
FOV_POINTS_ONLY: True
DATA_AUGMENTOR: DISABLE_AUG_LIST: ['placeholder'] AUG_CONFIG_LIST:
NAME: random_world_flip ALONG_AXIS_LIST: ['x']
DATA_PROCESSOR:
NAME: mask_points_and_boxes_outside_range REMOVE_OUTSIDE_BOXES: True
NAME: shuffle_points SHUFFLE_ENABLED: { 'train': True, 'test': False }
NAME: transform_points_to_voxels VOXEL_SIZE: [0.16, 0.16, 5] MAX_POINTS_PER_VOXEL: 10 MAX_NUMBER_OF_VOXELS: { 'train': 16000, 'test': 40000 } pointpillarsradar.yamlCLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']
DATA_CONFIG: BASE_CONFIG: cfgs/dataset_configs/radar_5frames_as_kitti_dataset.yaml POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2] DATA_PROCESSOR:
NAME: mask_points_and_boxes_outside_range REMOVE_OUTSIDE_BOXES: True
NAME: shuffle_points SHUFFLE_ENABLED: { 'train': True, 'test': False }
NAME: transform_points_to_voxels VOXEL_SIZE: [0.16, 0.16, 5] MAX_POINTS_PER_VOXEL: 10 MAX_NUMBER_OF_VOXELS: { 'train': 16000, 'test': 40000 } DATA_AUGMENTOR: DISABLE_AUG_LIST: ['random_world_rotation', 'gt_sampling'] AUG_CONFIG_LIST:
NAME: random_world_flip ALONG_AXIS_LIST: ['x']
NAME: random_world_scaling WORLD_SCALE_RANGE: [0.95, 1.05] MODEL: NAME: PointPillar
VFE: NAME: Radar7PillarVFE USE_XYZ: True USE_RCS: True USE_VR: True USE_VR_COMP: True USE_TIME: True USE_NORM: True USE_ELEVATION: True USE_DISTANCE: False NUM_FILTERS: [64]
MAP_TO_BEV: NAME: PointPillarScatter NUM_BEV_FEATURES: 64
BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [3, 5, 5] LAYER_STRIDES: [2, 2, 2] NUM_FILTERS: [64, 128, 256] UPSAMPLE_STRIDES: [1, 2, 4] NUM_UPSAMPLE_FILTERS: [128, 128, 128]
DENSE_HEAD: NAME: AnchorHeadSingle CLASS_AGNOSTIC: False
USE_DIRECTION_CLASSIFIER: True
DIR_OFFSET: 0.78539
DIR_LIMIT_OFFSET: 0.0
NUM_DIR_BINS: 2
ANCHOR_GENERATOR_CONFIG: [
{
'class_name': 'Car',
'anchor_sizes': [[3.9, 1.6, 1.56]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-1.78],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.6,
'unmatched_threshold': 0.45
},
{
'class_name': 'Pedestrian',
'anchor_sizes': [[0.8, 0.6, 1.73]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-0.6],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.5,
'unmatched_threshold': 0.35
},
{
'class_name': 'Cyclist',
'anchor_sizes': [[1.76, 0.6, 1.73]],
'anchor_rotations': [0, 1.57],
'anchor_bottom_heights': [-0.6],
'align_center': False,
'feature_map_stride': 2,
'matched_threshold': 0.5,
'unmatched_threshold': 0.35
}
]
TARGET_ASSIGNER_CONFIG:
NAME: AxisAlignedTargetAssigner
POS_FRACTION: -1.0
SAMPLE_SIZE: 512
NORM_BY_NUM_EXAMPLES: False
MATCH_HEIGHT: False
BOX_CODER: ResidualCoder
LOSS_CONFIG:
LOSS_WEIGHTS: {
'cls_weight': 1.0,
'loc_weight': 2.0,
'dir_weight': 0.2,
'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
}
POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] SCORE_THRESH: 0.1 OUTPUT_RAW_SCORE: False
EVAL_METRIC: kitti
NMS_CONFIG:
MULTI_CLASSES_NMS: False
NMS_TYPE: nms_gpu
NMS_THRESH: 0.01
NMS_PRE_MAXSIZE: 4096
NMS_POST_MAXSIZE: 500
OPTIMIZATION: BATCH_SIZE_PER_GPU: 16 NUM_EPOCHS: 80
OPTIMIZER: adam_onecycle LR: 0.003 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9
MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001
LR_WARMUP: False WARMUP_EPOCH: 1
GRAD_NORM_CLIP: 10
pillar_vfe.py `import torch import torch.nn as nn import torch.nn.functional as F
from .vfe_template import VFETemplate
class PFNLayer(nn.Module): def init(self, in_channels, out_channels, use_norm=True, last_layer=False): super().init()
self.last_vfe = last_layer
self.use_norm = use_norm
if not self.last_vfe:
out_channels = out_channels // 2
if self.use_norm:
self.linear = nn.Linear(in_channels, out_channels, bias=False)
self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
else:
self.linear = nn.Linear(in_channels, out_channels, bias=True)
self.part = 50000
def forward(self, inputs): if inputs.shape[0] > self.part:
num_parts = inputs.shape[0] // self.part
part_linear_out = [self.linear(inputs[num_part*self.part:(num_part+1)*self.part])
for num_part in range(num_parts+1)]
x = torch.cat(part_linear_out, dim=0)
else:
x = self.linear(inputs)
torch.backends.cudnn.enabled = False
x = self.norm(x.permute(0, 2, 1)).permute(0, 2, 1) if self.use_norm else x
torch.backends.cudnn.enabled = True
x = F.relu(x)
x_max = torch.max(x, dim=1, keepdim=True)[0]
if self.last_vfe:
return x_max
else:
x_repeat = x_max.repeat(1, inputs.shape[1], 1)
x_concatenated = torch.cat([x, x_repeat], dim=2)
return x_concatenated
class PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs): super().init(model_cfg=model_cfg)
self.use_norm = self.model_cfg.USE_NORM
self.with_distance = self.model_cfg.WITH_DISTANCE
self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ
num_point_features += 6 if self.use_absolute_xyz else 3
if self.with_distance:
num_point_features += 1
self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i]
out_filters = num_filters[i + 1]
pfn_layers.append(
PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
)
self.pfn_layers = nn.ModuleList(pfn_layers)
self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]
def get_output_feature_dim(self): return self.num_filters[-1]
def get_paddings_indicator(self, actual_num, max_num, axis=0): actual_num = torch.unsqueeze(actual_num, axis + 1) max_num_shape = [1] * len(actual_num.shape) max_num_shape[axis + 1] = -1 max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape) paddings_indicator = actual_num.int() > max_num return paddings_indicator
def forward(self, batch_dict, **kwargs):
voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict['voxel_coords']
points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
f_cluster = voxel_features[:, :, :3] - points_mean
f_center = torch.zeros_like(voxel_features[:, :, :3])
f_center[:, :, 0] = voxel_features[:, :, 0] - (coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, 1] - (coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
f_center[:, :, 2] = voxel_features[:, :, 2] - (coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)
if self.use_absolute_xyz:
features = [voxel_features, f_cluster, f_center]
else:
features = [voxel_features[..., 3:], f_cluster, f_center]
if self.with_distance:
points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
features.append(points_dist)
features = torch.cat(features, dim=-1)
voxel_count = features.shape[1]
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
features *= mask
for pfn in self.pfn_layers:
features = pfn(features)
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict
class Radar7PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range): super().init(model_cfg=model_cfg)
num_point_features = 0
self.use_norm = self.model_cfg.USE_NORM # whether to use batchnorm in the PFNLayer
self.use_xyz = self.model_cfg.USE_XYZ
self.with_distance = self.model_cfg.USE_DISTANCE
self.selected_indexes = []
## check if config has the correct params, if not, throw exception
radar_config_params = ["USE_RCS", "USE_VR", "USE_VR_COMP", "USE_TIME", "USE_ELEVATION"]
if all(hasattr(self.model_cfg, attr) for attr in radar_config_params):
self.use_RCS = self.model_cfg.USE_RCS
self.use_vr = self.model_cfg.USE_VR
self.use_vr_comp = self.model_cfg.USE_VR_COMP
self.use_time = self.model_cfg.USE_TIME
self.use_elevation = self.model_cfg.USE_ELEVATION
else:
raise Exception("config does not have the right parameters, please use a radar config")
self.available_features = ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time']
num_point_features += 6 # center_x, center_y, center_z, mean_x, mean_y, mean_z, time, we need 6 new
self.x_ind = self.available_features.index('x')
self.y_ind = self.available_features.index('y')
self.z_ind = self.available_features.index('z')
self.rcs_ind = self.available_features.index('rcs')
self.vr_ind = self.available_features.index('v_r')
self.vr_comp_ind = self.available_features.index('v_r_comp')
self.time_ind = self.available_features.index('time')
if self.use_xyz: # if x y z coordinates are used, add 3 channels and save the indexes
num_point_features += 3 # x, y, z
self.selected_indexes.extend((self.x_ind, self.y_ind, self.z_ind)) # adding x y z channels to the indexes
if self.use_RCS: # add 1 if RCS is used and save the indexes
num_point_features += 1
self.selected_indexes.append(self.rcs_ind) # adding RCS channels to the indexes
if self.use_vr: # add 1 if vr is used and save the indexes. Note, we use compensated vr!
num_point_features += 1
self.selected_indexes.append(self.vr_ind) # adding v_r_comp channels to the indexes
if self.use_vr_comp: # add 1 if vr is used (as proxy for sensor cue) and save the indexes
num_point_features += 1
self.selected_indexes.append(self.vr_comp_ind)
if self.use_time: # add 1 if time is used and save the indexes
num_point_features += 1
self.selected_indexes.append(self.time_ind) # adding time channel to the indexes
### LOGGING USED FEATURES ###
print("number of point features used: " + str(num_point_features))
print("6 of these are 2 * (x y z) coordinates realtive to mean and center of pillars")
print(str(len(self.selected_indexes)) + " are selected original features: ")
for k in self.selected_indexes:
print(str(k) + ": " + self.available_features[k])
self.selected_indexes = torch.LongTensor(self.selected_indexes) # turning used indexes into Tensor
self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)
pfn_layers = []
for i in range(len(num_filters) - 1):
in_filters = num_filters[i]
out_filters = num_filters[i + 1]
pfn_layers.append(
PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
)
self.pfn_layers = nn.ModuleList(pfn_layers)
## saving size of the voxel
self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]
## saving offsets, start of point cloud in x, y, z + half a voxel, e.g. in y it starts around -39 m
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]
def get_output_feature_dim(self): return self.num_filters[-1] # number of outputs in last output channel
def get_paddings_indicator(self, actual_num, max_num, axis=0): actual_num = torch.unsqueeze(actual_num, axis + 1) max_num_shape = [1] * len(actual_num.shape) max_num_shape[axis + 1] = -1 max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape) paddings_indicator = actual_num.int() > max_num return paddings_indicator
def forward(self, batch_dict, **kwargs):
# x is pointing forward, y is left right, z is up down
# spconv returns voxel_coords as [batch_idx, z_idx, y_idx, x_idx], that is why coords is indexed backwards
voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict[
'voxel_coords']
if not self.use_elevation: # if we ignore elevation (z) and v_z
voxel_features[:, :, self.z_ind] = 0 # set z to zero before doing anything
orig_xyz = voxel_features[:, :, :self.z_ind + 1] # selecting x y z
# calculate mean of points in pillars for x y z and save the offset from the mean
# Note: they do not take the mean directly, as each pillar is filled up with 0-s. Instead, they sum and divide by num of points
points_mean = orig_xyz.sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
f_cluster = orig_xyz - points_mean # offset from cluster mean
# calculate center for each pillar and save points' offset from the center. voxel_coordinate * voxel size + offset should be the center of pillar (coords are indexed backwards)
f_center = torch.zeros_like(orig_xyz)
f_center[:, :, 0] = voxel_features[:, :, self.x_ind] - (
coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, self.y_ind] - (
coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
f_center[:, :, 2] = voxel_features[:, :, self.z_ind] - (
coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)
voxel_features = voxel_features[:, :, self.selected_indexes] # filtering for used features
features = [voxel_features, f_cluster, f_center]
if self.with_distance: # if with_distance is true, include range to the points as well
points_dist = torch.norm(orig_xyz, 2, 2, keepdim=True) # first 2: L2 norm second 2: along 2. dim
features.append(points_dist)
## finishing up the feature extraction with correct shape and masking
features = torch.cat(features, dim=-1)
voxel_count = features.shape[1]
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
features *= mask
for pfn in self.pfn_layers:
features = pfn(features)
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict
`
/pcdet/models/backbones_3d/vfe/init.py `from .mean_vfe import MeanVFE from .pillar_vfe import PillarVFE,Radar7PillarVFE from .dynamic_mean_vfe import DynamicMeanVFE from .dynamic_pillar_vfe import DynamicPillarVFE, DynamicPillarVFESimple2D from .image_vfe import ImageVFE from .vfe_template import VFETemplate
all = { 'VFETemplate': VFETemplate, 'MeanVFE': MeanVFE, 'PillarVFE': PillarVFE, 'Radar7PillarVFE': Radar7PillarVFE, 'ImageVFE': ImageVFE, 'DynMeanVFE': DynamicMeanVFE, 'DynPillarVFE': DynamicPillarVFE, 'DynamicPillarVFESimple2D': DynamicPillarVFESimple2D } pcdet/datasets/kitti/kitti_dataset.pydef get_lidar(self, idx): lidar_file = self.root_split_path / 'velodyne' / ('%s.bin' % idx) assert lidar_file.exists() number_of_channels = 7 # ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'] points = np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)
means = [0, 0, 0, 0, 0, 0, 0] # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time' stds = [1, 1, 1, 1, 1, 1, 1] # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'
# we then norm the channels
points = (points - means) / stds
return points
# return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)
# return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4)`
Can you help me see what is the reason
您可以看看历史问题,有人此前遇到了相同的问题,按照已关闭的问题中提供的解决方案进行操作即可,希望帮到您。