tudelft-iv / view-of-delft-dataset

This repository shares the documentation and development kit of the View of Delft automotive dataset.
Other
246 stars 39 forks source link

车的检测精度还可以,但是行人和自行车的检测精度很低 #52

Closed min-zhang closed 1 year ago

Yinbao-Li commented 1 year ago

您可以看看历史问题,有人此前遇到了相同的问题,按照已关闭的问题中提供的解决方案进行操作即可,希望帮到您。

andraspalffy commented 1 year ago

Guys please talk in English, or talk privately. Thank you

min-zhang commented 1 year ago

Hello, my test result is as follows, epoch is 80, INFO Car AP@0.70, 0.70, 0.70: bbox AP:36.2793, 39.2191, 32.0413 bev AP:42.8272, 47.2691, 40.7447 3d AP:17.6040, 20.9249, 19.2227 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.70, 0.70: bbox AP:33.1363, 37.1293, 30.5799 bev AP:40.0895, 45.7915, 37.5233 3d AP:11.6239, 17.0281, 13.8623 aos AP:30.95, 36.06, 29.66 Car AP@0.70, 0.50, 0.50: bbox AP:36.2793, 39.2191, 32.0413 bev AP:63.7014, 60.0651, 51.6864 3d AP:43.5480, 46.8279, 40.3347 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.50, 0.50: bbox AP:33.1363, 37.1293, 30.5799 bev AP:62.9937, 59.0316, 51.2108 3d AP:41.3002, 44.1800, 37.1804 aos AP:30.95, 36.06, 29.66 Pedestrian AP@0.50, 0.50, 0.50: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.0567, 0.0942, 0.0942 3d AP:0.0301, 0.0501, 0.0501 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.50, 0.50: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0156, 0.0259, 0.0259 3d AP:0.0083, 0.0138, 0.0138 aos AP:0.03, 0.04, 0.04 Pedestrian AP@0.50, 0.25, 0.25: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.1384, 0.3047, 0.2964 3d AP:0.1028, 0.2191, 0.2133 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.25, 0.25: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0680, 0.1462, 0.1439 3d AP:0.0540, 0.1018, 0.1002 aos AP:0.03, 0.04, 0.04 Cyclist AP@0.50, 0.50, 0.50: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.50, 0.50: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP@0.50, 0.25, 0.25: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.1310, 0.1310, 0.1310 3d AP:0.1198, 0.1198, 0.1198 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.25, 0.25: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0728, 0.0728, 0.0728 3d AP:0.0662, 0.0662, 0.0662 aos AP:0.01, 0.01, 0.01 the detection accuracy of people and bicycles is very low, I modified it by your https://github.com/tudelft-iv/view-of-delft-dataset/blob/main/PP-Radar.md, and the modified file is as follows: radar_5frames_as_kitti_dataset.yaml `DATASET: 'KittiDataset' DATA_PATH: '../data/radar_5frames'

POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2]

DATA_SPLIT: { 'train': train, 'test': val }

INFO_PATH: { 'train': [kitti_infos_train.pkl], 'test': [kitti_infos_val.pkl], }

FOV_POINTS_ONLY: True

DATA_AUGMENTOR: DISABLE_AUG_LIST: ['placeholder'] AUG_CONFIG_LIST:

POINT_FEATURE_ENCODING: { encoding_type: absolute_coordinates_encoding, used_feature_list: ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'], src_feature_list: ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'], }

DATA_PROCESSOR:

DATA_CONFIG: _BASECONFIG: cfgs/dataset_configs/radar_5frames_as_kitti_dataset.yaml POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2] DATA_PROCESSOR:

MODEL: NAME: PointPillar

VFE:
    NAME: Radar7PillarVFE
    USE_XYZ: True
    USE_RCS: True
    USE_VR: True
    USE_VR_COMP: True
    USE_TIME: True
    USE_NORM: True
    USE_ELEVATION: True
    USE_DISTANCE: False
    NUM_FILTERS: [64]

MAP_TO_BEV:
    NAME: PointPillarScatter
    NUM_BEV_FEATURES: 64

BACKBONE_2D:
    NAME: BaseBEVBackbone
    LAYER_NUMS: [3, 5, 5]
    LAYER_STRIDES: [2, 2, 2]
    NUM_FILTERS: [64, 128, 256]
    UPSAMPLE_STRIDES: [1, 2, 4]
    NUM_UPSAMPLE_FILTERS: [128, 128, 128]

DENSE_HEAD:
    NAME: AnchorHeadSingle
    CLASS_AGNOSTIC: False

    USE_DIRECTION_CLASSIFIER: True
    DIR_OFFSET: 0.78539
    DIR_LIMIT_OFFSET: 0.0
    NUM_DIR_BINS: 2

    ANCHOR_GENERATOR_CONFIG: [
        {
            'class_name': 'Car',
            'anchor_sizes': [[3.9, 1.6, 1.56]],
            'anchor_rotations': [0, 1.57],
            'anchor_bottom_heights': [-1.78],
            'align_center': False,
            'feature_map_stride': 2,
            'matched_threshold': 0.6,
            'unmatched_threshold': 0.45
        },
        {
            'class_name': 'Pedestrian',
            'anchor_sizes': [[0.8, 0.6, 1.73]],
            'anchor_rotations': [0, 1.57],
            'anchor_bottom_heights': [-0.6],
            'align_center': False,
            'feature_map_stride': 2,
            'matched_threshold': 0.5,
            'unmatched_threshold': 0.35
        },
        {
            'class_name': 'Cyclist',
            'anchor_sizes': [[1.76, 0.6, 1.73]],
            'anchor_rotations': [0, 1.57],
            'anchor_bottom_heights': [-0.6],
            'align_center': False,
            'feature_map_stride': 2,
            'matched_threshold': 0.5,
            'unmatched_threshold': 0.35
        }
    ]

    TARGET_ASSIGNER_CONFIG:
        NAME: AxisAlignedTargetAssigner
        POS_FRACTION: -1.0
        SAMPLE_SIZE: 512
        NORM_BY_NUM_EXAMPLES: False
        MATCH_HEIGHT: False
        BOX_CODER: ResidualCoder

    LOSS_CONFIG:
        LOSS_WEIGHTS: {
            'cls_weight': 1.0,
            'loc_weight': 2.0,
            'dir_weight': 0.2,
            'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
        }

POST_PROCESSING:
    RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
    SCORE_THRESH: 0.1
    OUTPUT_RAW_SCORE: False

    EVAL_METRIC: kitti

    NMS_CONFIG:
        MULTI_CLASSES_NMS: False
        NMS_TYPE: nms_gpu
        NMS_THRESH: 0.01
        NMS_PRE_MAXSIZE: 4096
        NMS_POST_MAXSIZE: 500

OPTIMIZATION: BATCH_SIZE_PER_GPU: 16 NUM_EPOCHS: 80

OPTIMIZER: adam_onecycle
LR: 0.003
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9

MOMS: [0.95, 0.85]
PCT_START: 0.4
DIV_FACTOR: 10
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001

LR_WARMUP: False
WARMUP_EPOCH: 1

GRAD_NORM_CLIP: 10

`

pillar_vfe.py `import torch import torch.nn as nn import torch.nn.functional as F

from .vfe_template import VFETemplate

class PFNLayer(nn.Module): def init(self, in_channels, out_channels, use_norm=True, last_layer=False): super().init()

    self.last_vfe = last_layer
    self.use_norm = use_norm
    if not self.last_vfe:
        out_channels = out_channels // 2

    if self.use_norm:
        self.linear = nn.Linear(in_channels, out_channels, bias=False)
        self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
    else:
        self.linear = nn.Linear(in_channels, out_channels, bias=True)

    self.part = 50000

def forward(self, inputs):
    if inputs.shape[0] > self.part:
        # nn.Linear performs randomly when batch size is too large
        num_parts = inputs.shape[0] // self.part
        part_linear_out = [self.linear(inputs[num_part*self.part:(num_part+1)*self.part])
                           for num_part in range(num_parts+1)]
        x = torch.cat(part_linear_out, dim=0)
    else:
        x = self.linear(inputs)
    torch.backends.cudnn.enabled = False
    x = self.norm(x.permute(0, 2, 1)).permute(0, 2, 1) if self.use_norm else x
    torch.backends.cudnn.enabled = True
    x = F.relu(x)
    x_max = torch.max(x, dim=1, keepdim=True)[0]

    if self.last_vfe:
        return x_max
    else:
        x_repeat = x_max.repeat(1, inputs.shape[1], 1)
        x_concatenated = torch.cat([x, x_repeat], dim=2)
        return x_concatenated

class PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs): super().init(model_cfg=model_cfg)

    self.use_norm = self.model_cfg.USE_NORM
    self.with_distance = self.model_cfg.WITH_DISTANCE
    self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ
    num_point_features += 6 if self.use_absolute_xyz else 3
    if self.with_distance:
        num_point_features += 1

    self.num_filters = self.model_cfg.NUM_FILTERS
    assert len(self.num_filters) > 0
    num_filters = [num_point_features] + list(self.num_filters)

    pfn_layers = []
    for i in range(len(num_filters) - 1):
        in_filters = num_filters[i]
        out_filters = num_filters[i + 1]
        pfn_layers.append(
            PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
        )
    self.pfn_layers = nn.ModuleList(pfn_layers)

    self.voxel_x = voxel_size[0]
    self.voxel_y = voxel_size[1]
    self.voxel_z = voxel_size[2]
    self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
    self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
    self.z_offset = self.voxel_z / 2 + point_cloud_range[2]

def get_output_feature_dim(self):
    return self.num_filters[-1]

def get_paddings_indicator(self, actual_num, max_num, axis=0):
    actual_num = torch.unsqueeze(actual_num, axis + 1)
    max_num_shape = [1] * len(actual_num.shape)
    max_num_shape[axis + 1] = -1
    max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
    paddings_indicator = actual_num.int() > max_num
    return paddings_indicator

def forward(self, batch_dict, **kwargs):

    voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict['voxel_coords']
    points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
    f_cluster = voxel_features[:, :, :3] - points_mean

    f_center = torch.zeros_like(voxel_features[:, :, :3])
    f_center[:, :, 0] = voxel_features[:, :, 0] - (coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
    f_center[:, :, 1] = voxel_features[:, :, 1] - (coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
    f_center[:, :, 2] = voxel_features[:, :, 2] - (coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)

    if self.use_absolute_xyz:
        features = [voxel_features, f_cluster, f_center]
    else:
        features = [voxel_features[..., 3:], f_cluster, f_center]

    if self.with_distance:
        points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
        features.append(points_dist)
    features = torch.cat(features, dim=-1)

    voxel_count = features.shape[1]
    mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
    mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
    features *= mask
    for pfn in self.pfn_layers:
        features = pfn(features)
    features = features.squeeze()
    batch_dict['pillar_features'] = features
    return batch_dict

class Radar7PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range): super().init(model_cfg=model_cfg)

    num_point_features = 0
    self.use_norm = self.model_cfg.USE_NORM  # whether to use batchnorm in the PFNLayer
    self.use_xyz = self.model_cfg.USE_XYZ
    self.with_distance = self.model_cfg.USE_DISTANCE
    self.selected_indexes = []

    ## check if config has the correct params, if not, throw exception
    radar_config_params = ["USE_RCS", "USE_VR", "USE_VR_COMP", "USE_TIME", "USE_ELEVATION"]

    if all(hasattr(self.model_cfg, attr) for attr in radar_config_params):
        self.use_RCS = self.model_cfg.USE_RCS
        self.use_vr = self.model_cfg.USE_VR
        self.use_vr_comp = self.model_cfg.USE_VR_COMP
        self.use_time = self.model_cfg.USE_TIME
        self.use_elevation = self.model_cfg.USE_ELEVATION

    else:
        raise Exception("config does not have the right parameters, please use a radar config")

    self.available_features = ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time']

    num_point_features += 6  # center_x, center_y, center_z, mean_x, mean_y, mean_z, time, we need 6 new

    self.x_ind = self.available_features.index('x')
    self.y_ind = self.available_features.index('y')
    self.z_ind = self.available_features.index('z')
    self.rcs_ind = self.available_features.index('rcs')
    self.vr_ind = self.available_features.index('v_r')
    self.vr_comp_ind = self.available_features.index('v_r_comp')
    self.time_ind = self.available_features.index('time')

    if self.use_xyz:  # if x y z coordinates are used, add 3 channels and save the indexes
        num_point_features += 3  # x, y, z
        self.selected_indexes.extend((self.x_ind, self.y_ind, self.z_ind))  # adding x y z channels to the indexes

    if self.use_RCS:  # add 1 if RCS is used and save the indexes
        num_point_features += 1
        self.selected_indexes.append(self.rcs_ind)  # adding  RCS channels to the indexes

    if self.use_vr:  # add 1 if vr is used and save the indexes. Note, we use compensated vr!
        num_point_features += 1
        self.selected_indexes.append(self.vr_ind)  # adding  v_r_comp channels to the indexes

    if self.use_vr_comp:  # add 1 if vr is used (as proxy for sensor cue) and save the indexes
        num_point_features += 1
        self.selected_indexes.append(self.vr_comp_ind)

    if self.use_time:  # add 1 if time is used and save the indexes
        num_point_features += 1
        self.selected_indexes.append(self.time_ind)  # adding  time channel to the indexes

    ### LOGGING USED FEATURES ###
    print("number of point features used: " + str(num_point_features))
    print("6 of these are 2 * (x y z)  coordinates realtive to mean and center of pillars")
    print(str(len(self.selected_indexes)) + " are selected original features: ")

    for k in self.selected_indexes:
        print(str(k) + ": " + self.available_features[k])

    self.selected_indexes = torch.LongTensor(self.selected_indexes)  # turning used indexes into Tensor

    self.num_filters = self.model_cfg.NUM_FILTERS
    assert len(self.num_filters) > 0
    num_filters = [num_point_features] + list(self.num_filters)

    pfn_layers = []
    for i in range(len(num_filters) - 1):
        in_filters = num_filters[i]
        out_filters = num_filters[i + 1]
        pfn_layers.append(
            PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
        )
    self.pfn_layers = nn.ModuleList(pfn_layers)

    ## saving size of the voxel
    self.voxel_x = voxel_size[0]
    self.voxel_y = voxel_size[1]
    self.voxel_z = voxel_size[2]

    ## saving offsets, start of point cloud in x, y, z + half a voxel, e.g. in y it starts around -39 m
    self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
    self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
    self.z_offset = self.voxel_z / 2 + point_cloud_range[2]

def get_output_feature_dim(self):
    return self.num_filters[-1]  # number of outputs in last output channel

def get_paddings_indicator(self, actual_num, max_num, axis=0):
    actual_num = torch.unsqueeze(actual_num, axis + 1)
    max_num_shape = [1] * len(actual_num.shape)
    max_num_shape[axis + 1] = -1
    max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape)
    paddings_indicator = actual_num.int() > max_num
    return paddings_indicator

def forward(self, batch_dict, **kwargs):
    ## coordinate system notes
    # x is pointing forward, y is left right, z is up down
    # spconv returns voxel_coords as  [batch_idx, z_idx, y_idx, x_idx], that is why coords is indexed backwards

    voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict[
        'voxel_coords']

    if not self.use_elevation:  # if we ignore elevation (z) and v_z
        voxel_features[:, :, self.z_ind] = 0  # set z to zero before doing anything

    orig_xyz = voxel_features[:, :, :self.z_ind + 1]  # selecting x y z

    # calculate mean of points in pillars for x y z and save the offset from the mean
    # Note: they do not take the mean directly, as each pillar is filled up with 0-s. Instead, they sum and divide by num of points
    points_mean = orig_xyz.sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
    f_cluster = orig_xyz - points_mean  # offset from cluster mean

    # calculate center for each pillar and save points' offset from the center. voxel_coordinate * voxel size + offset should be the center of pillar (coords are indexed backwards)
    f_center = torch.zeros_like(orig_xyz)
    f_center[:, :, 0] = voxel_features[:, :, self.x_ind] - (
                coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
    f_center[:, :, 1] = voxel_features[:, :, self.y_ind] - (
                coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
    f_center[:, :, 2] = voxel_features[:, :, self.z_ind] - (
                coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)

    voxel_features = voxel_features[:, :, self.selected_indexes]  # filtering for used features

    features = [voxel_features, f_cluster, f_center]

    if self.with_distance:  # if with_distance is true, include range to the points as well
        points_dist = torch.norm(orig_xyz, 2, 2, keepdim=True)  # first 2: L2 norm second 2: along 2. dim
        features.append(points_dist)

    ## finishing up the feature extraction with correct shape and masking
    features = torch.cat(features, dim=-1)

    voxel_count = features.shape[1]
    mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
    mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
    features *= mask

    for pfn in self.pfn_layers:
        features = pfn(features)
    features = features.squeeze()
    batch_dict['pillar_features'] = features
    return batch_dict

`

/pcdet/models/backbones_3d/vfe/init.py `from .mean_vfe import MeanVFE from .pillar_vfe import PillarVFE,Radar7PillarVFE from .dynamic_mean_vfe import DynamicMeanVFE from .dynamic_pillar_vfe import DynamicPillarVFE, DynamicPillarVFESimple2D from .image_vfe import ImageVFE from .vfe_template import VFETemplate

all = { 'VFETemplate': VFETemplate, 'MeanVFE': MeanVFE, 'PillarVFE': PillarVFE, 'Radar7PillarVFE': Radar7PillarVFE, 'ImageVFE': ImageVFE, 'DynMeanVFE': DynamicMeanVFE, 'DynPillarVFE': DynamicPillarVFE, 'DynamicPillarVFESimple2D': DynamicPillarVFESimple2D } pcdet/datasets/kitti/kitti_dataset.py def get_lidar(self, idx): lidar_file = self.root_split_path / 'velodyne' / ('%s.bin' % idx) assert lidar_file.exists() number_of_channels = 7 # ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'] points = np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)

replace the list values with statistical values; for x, y, z and time, use 0 and 1 as means and std to avoid normalization

    means = [0, 0, 0, 0, 0, 0, 0]  # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'
    stds = [1, 1, 1, 1, 1, 1, 1]  # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'

    # we then norm the channels
    points = (points - means) / stds
    return points
    # return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)
    # return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4)`

Can you help me see what is the reason

min-zhang commented 1 year ago

Guys please talk in English, or talk privately. Thank you Hello, my test result is as follows, epoch is 80, INFO Car AP@0.70, 0.70, 0.70: bbox AP:36.2793, 39.2191, 32.0413 bev AP:42.8272, 47.2691, 40.7447 3d AP:17.6040, 20.9249, 19.2227 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.70, 0.70: bbox AP:33.1363, 37.1293, 30.5799 bev AP:40.0895, 45.7915, 37.5233 3d AP:11.6239, 17.0281, 13.8623 aos AP:30.95, 36.06, 29.66 Car AP@0.70, 0.50, 0.50: bbox AP:36.2793, 39.2191, 32.0413 bev AP:63.7014, 60.0651, 51.6864 3d AP:43.5480, 46.8279, 40.3347 aos AP:34.15, 38.25, 31.37 Car AP_R40@0.70, 0.50, 0.50: bbox AP:33.1363, 37.1293, 30.5799 bev AP:62.9937, 59.0316, 51.2108 3d AP:41.3002, 44.1800, 37.1804 aos AP:30.95, 36.06, 29.66 Pedestrian AP@0.50, 0.50, 0.50: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.0567, 0.0942, 0.0942 3d AP:0.0301, 0.0501, 0.0501 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.50, 0.50: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0156, 0.0259, 0.0259 3d AP:0.0083, 0.0138, 0.0138 aos AP:0.03, 0.04, 0.04 Pedestrian AP@0.50, 0.25, 0.25: bbox AP:0.0845, 0.1779, 0.1848 bev AP:0.1384, 0.3047, 0.2964 3d AP:0.1028, 0.2191, 0.2133 aos AP:0.06, 0.09, 0.09 Pedestrian AP_R40@0.50, 0.25, 0.25: bbox AP:0.0449, 0.0896, 0.0930 bev AP:0.0680, 0.1462, 0.1439 3d AP:0.0540, 0.1018, 0.1002 aos AP:0.03, 0.04, 0.04 Cyclist AP@0.50, 0.50, 0.50: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.50, 0.50: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0000, 0.0000, 0.0000 3d AP:0.0000, 0.0000, 0.0000 aos AP:0.01, 0.01, 0.01 Cyclist AP@0.50, 0.25, 0.25: bbox AP:0.0403, 0.0403, 0.0403 bev AP:0.1310, 0.1310, 0.1310 3d AP:0.1198, 0.1198, 0.1198 aos AP:0.01, 0.01, 0.01 Cyclist AP_R40@0.50, 0.25, 0.25: bbox AP:0.0333, 0.0333, 0.0333 bev AP:0.0728, 0.0728, 0.0728 3d AP:0.0662, 0.0662, 0.0662 aos AP:0.01, 0.01, 0.01 the detection accuracy of people and bicycles is very low, I modified it by your https://github.com/tudelft-iv/view-of-delft-dataset/blob/main/PP-Radar.md, and the modified file is as follows: radar_5frames_as_kitti_dataset.yaml `DATASET: 'KittiDataset' DATA_PATH: '../data/radar_5frames'

POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2]

DATA_SPLIT: { 'train': train, 'test': val }

INFO_PATH: { 'train': [kitti_infos_train.pkl], 'test': [kitti_infos_val.pkl], }

FOV_POINTS_ONLY: True

DATA_AUGMENTOR: DISABLE_AUG_LIST: ['placeholder'] AUG_CONFIG_LIST:

DATA_PROCESSOR:

DATA_CONFIG: BASE_CONFIG: cfgs/dataset_configs/radar_5frames_as_kitti_dataset.yaml POINT_CLOUD_RANGE: [0, -25.6, -3, 51.2, 25.6, 2] DATA_PROCESSOR:

VFE: NAME: Radar7PillarVFE USE_XYZ: True USE_RCS: True USE_VR: True USE_VR_COMP: True USE_TIME: True USE_NORM: True USE_ELEVATION: True USE_DISTANCE: False NUM_FILTERS: [64]

MAP_TO_BEV: NAME: PointPillarScatter NUM_BEV_FEATURES: 64

BACKBONE_2D: NAME: BaseBEVBackbone LAYER_NUMS: [3, 5, 5] LAYER_STRIDES: [2, 2, 2] NUM_FILTERS: [64, 128, 256] UPSAMPLE_STRIDES: [1, 2, 4] NUM_UPSAMPLE_FILTERS: [128, 128, 128]

DENSE_HEAD: NAME: AnchorHeadSingle CLASS_AGNOSTIC: False

USE_DIRECTION_CLASSIFIER: True
DIR_OFFSET: 0.78539
DIR_LIMIT_OFFSET: 0.0
NUM_DIR_BINS: 2

ANCHOR_GENERATOR_CONFIG: [
    {
        'class_name': 'Car',
        'anchor_sizes': [[3.9, 1.6, 1.56]],
        'anchor_rotations': [0, 1.57],
        'anchor_bottom_heights': [-1.78],
        'align_center': False,
        'feature_map_stride': 2,
        'matched_threshold': 0.6,
        'unmatched_threshold': 0.45
    },
    {
        'class_name': 'Pedestrian',
        'anchor_sizes': [[0.8, 0.6, 1.73]],
        'anchor_rotations': [0, 1.57],
        'anchor_bottom_heights': [-0.6],
        'align_center': False,
        'feature_map_stride': 2,
        'matched_threshold': 0.5,
        'unmatched_threshold': 0.35
    },
    {
        'class_name': 'Cyclist',
        'anchor_sizes': [[1.76, 0.6, 1.73]],
        'anchor_rotations': [0, 1.57],
        'anchor_bottom_heights': [-0.6],
        'align_center': False,
        'feature_map_stride': 2,
        'matched_threshold': 0.5,
        'unmatched_threshold': 0.35
    }
]

TARGET_ASSIGNER_CONFIG:
    NAME: AxisAlignedTargetAssigner
    POS_FRACTION: -1.0
    SAMPLE_SIZE: 512
    NORM_BY_NUM_EXAMPLES: False
    MATCH_HEIGHT: False
    BOX_CODER: ResidualCoder

LOSS_CONFIG:
    LOSS_WEIGHTS: {
        'cls_weight': 1.0,
        'loc_weight': 2.0,
        'dir_weight': 0.2,
        'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
    }

POST_PROCESSING: RECALL_THRESH_LIST: [0.3, 0.5, 0.7] SCORE_THRESH: 0.1 OUTPUT_RAW_SCORE: False

EVAL_METRIC: kitti

NMS_CONFIG:
    MULTI_CLASSES_NMS: False
    NMS_TYPE: nms_gpu
    NMS_THRESH: 0.01
    NMS_PRE_MAXSIZE: 4096
    NMS_POST_MAXSIZE: 500

OPTIMIZATION: BATCH_SIZE_PER_GPU: 16 NUM_EPOCHS: 80

OPTIMIZER: adam_onecycle LR: 0.003 WEIGHT_DECAY: 0.01 MOMENTUM: 0.9

MOMS: [0.95, 0.85] PCT_START: 0.4 DIV_FACTOR: 10 DECAY_STEP_LIST: [35, 45] LR_DECAY: 0.1 LR_CLIP: 0.0000001

LR_WARMUP: False WARMUP_EPOCH: 1

GRAD_NORM_CLIP: 10

pillar_vfe.py `import torch import torch.nn as nn import torch.nn.functional as F

from .vfe_template import VFETemplate

class PFNLayer(nn.Module): def init(self, in_channels, out_channels, use_norm=True, last_layer=False): super().init()

self.last_vfe = last_layer
self.use_norm = use_norm
if not self.last_vfe:
    out_channels = out_channels // 2

if self.use_norm:
    self.linear = nn.Linear(in_channels, out_channels, bias=False)
    self.norm = nn.BatchNorm1d(out_channels, eps=1e-3, momentum=0.01)
else:
    self.linear = nn.Linear(in_channels, out_channels, bias=True)

self.part = 50000

def forward(self, inputs): if inputs.shape[0] > self.part:

nn.Linear performs randomly when batch size is too large

    num_parts = inputs.shape[0] // self.part
    part_linear_out = [self.linear(inputs[num_part*self.part:(num_part+1)*self.part])
                       for num_part in range(num_parts+1)]
    x = torch.cat(part_linear_out, dim=0)
else:
    x = self.linear(inputs)
torch.backends.cudnn.enabled = False
x = self.norm(x.permute(0, 2, 1)).permute(0, 2, 1) if self.use_norm else x
torch.backends.cudnn.enabled = True
x = F.relu(x)
x_max = torch.max(x, dim=1, keepdim=True)[0]

if self.last_vfe:
    return x_max
else:
    x_repeat = x_max.repeat(1, inputs.shape[1], 1)
    x_concatenated = torch.cat([x, x_repeat], dim=2)
    return x_concatenated

class PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range, **kwargs): super().init(model_cfg=model_cfg)

self.use_norm = self.model_cfg.USE_NORM
self.with_distance = self.model_cfg.WITH_DISTANCE
self.use_absolute_xyz = self.model_cfg.USE_ABSLOTE_XYZ
num_point_features += 6 if self.use_absolute_xyz else 3
if self.with_distance:
    num_point_features += 1

self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)

pfn_layers = []
for i in range(len(num_filters) - 1):
    in_filters = num_filters[i]
    out_filters = num_filters[i + 1]
    pfn_layers.append(
        PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
    )
self.pfn_layers = nn.ModuleList(pfn_layers)

self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]

def get_output_feature_dim(self): return self.num_filters[-1]

def get_paddings_indicator(self, actual_num, max_num, axis=0): actual_num = torch.unsqueeze(actual_num, axis + 1) max_num_shape = [1] * len(actual_num.shape) max_num_shape[axis + 1] = -1 max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape) paddings_indicator = actual_num.int() > max_num return paddings_indicator

def forward(self, batch_dict, **kwargs):

voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict['voxel_coords']
points_mean = voxel_features[:, :, :3].sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
f_cluster = voxel_features[:, :, :3] - points_mean

f_center = torch.zeros_like(voxel_features[:, :, :3])
f_center[:, :, 0] = voxel_features[:, :, 0] - (coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, 1] - (coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
f_center[:, :, 2] = voxel_features[:, :, 2] - (coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)

if self.use_absolute_xyz:
    features = [voxel_features, f_cluster, f_center]
else:
    features = [voxel_features[..., 3:], f_cluster, f_center]

if self.with_distance:
    points_dist = torch.norm(voxel_features[:, :, :3], 2, 2, keepdim=True)
    features.append(points_dist)
features = torch.cat(features, dim=-1)

voxel_count = features.shape[1]
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
features *= mask
for pfn in self.pfn_layers:
    features = pfn(features)
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict

class Radar7PillarVFE(VFETemplate): def init(self, model_cfg, num_point_features, voxel_size, point_cloud_range): super().init(model_cfg=model_cfg)

num_point_features = 0
self.use_norm = self.model_cfg.USE_NORM  # whether to use batchnorm in the PFNLayer
self.use_xyz = self.model_cfg.USE_XYZ
self.with_distance = self.model_cfg.USE_DISTANCE
self.selected_indexes = []

## check if config has the correct params, if not, throw exception
radar_config_params = ["USE_RCS", "USE_VR", "USE_VR_COMP", "USE_TIME", "USE_ELEVATION"]

if all(hasattr(self.model_cfg, attr) for attr in radar_config_params):
    self.use_RCS = self.model_cfg.USE_RCS
    self.use_vr = self.model_cfg.USE_VR
    self.use_vr_comp = self.model_cfg.USE_VR_COMP
    self.use_time = self.model_cfg.USE_TIME
    self.use_elevation = self.model_cfg.USE_ELEVATION

else:
    raise Exception("config does not have the right parameters, please use a radar config")

self.available_features = ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time']

num_point_features += 6  # center_x, center_y, center_z, mean_x, mean_y, mean_z, time, we need 6 new

self.x_ind = self.available_features.index('x')
self.y_ind = self.available_features.index('y')
self.z_ind = self.available_features.index('z')
self.rcs_ind = self.available_features.index('rcs')
self.vr_ind = self.available_features.index('v_r')
self.vr_comp_ind = self.available_features.index('v_r_comp')
self.time_ind = self.available_features.index('time')

if self.use_xyz:  # if x y z coordinates are used, add 3 channels and save the indexes
    num_point_features += 3  # x, y, z
    self.selected_indexes.extend((self.x_ind, self.y_ind, self.z_ind))  # adding x y z channels to the indexes

if self.use_RCS:  # add 1 if RCS is used and save the indexes
    num_point_features += 1
    self.selected_indexes.append(self.rcs_ind)  # adding  RCS channels to the indexes

if self.use_vr:  # add 1 if vr is used and save the indexes. Note, we use compensated vr!
    num_point_features += 1
    self.selected_indexes.append(self.vr_ind)  # adding  v_r_comp channels to the indexes

if self.use_vr_comp:  # add 1 if vr is used (as proxy for sensor cue) and save the indexes
    num_point_features += 1
    self.selected_indexes.append(self.vr_comp_ind)

if self.use_time:  # add 1 if time is used and save the indexes
    num_point_features += 1
    self.selected_indexes.append(self.time_ind)  # adding  time channel to the indexes

### LOGGING USED FEATURES ###
print("number of point features used: " + str(num_point_features))
print("6 of these are 2 * (x y z)  coordinates realtive to mean and center of pillars")
print(str(len(self.selected_indexes)) + " are selected original features: ")

for k in self.selected_indexes:
    print(str(k) + ": " + self.available_features[k])

self.selected_indexes = torch.LongTensor(self.selected_indexes)  # turning used indexes into Tensor

self.num_filters = self.model_cfg.NUM_FILTERS
assert len(self.num_filters) > 0
num_filters = [num_point_features] + list(self.num_filters)

pfn_layers = []
for i in range(len(num_filters) - 1):
    in_filters = num_filters[i]
    out_filters = num_filters[i + 1]
    pfn_layers.append(
        PFNLayer(in_filters, out_filters, self.use_norm, last_layer=(i >= len(num_filters) - 2))
    )
self.pfn_layers = nn.ModuleList(pfn_layers)

## saving size of the voxel
self.voxel_x = voxel_size[0]
self.voxel_y = voxel_size[1]
self.voxel_z = voxel_size[2]

## saving offsets, start of point cloud in x, y, z + half a voxel, e.g. in y it starts around -39 m
self.x_offset = self.voxel_x / 2 + point_cloud_range[0]
self.y_offset = self.voxel_y / 2 + point_cloud_range[1]
self.z_offset = self.voxel_z / 2 + point_cloud_range[2]

def get_output_feature_dim(self): return self.num_filters[-1] # number of outputs in last output channel

def get_paddings_indicator(self, actual_num, max_num, axis=0): actual_num = torch.unsqueeze(actual_num, axis + 1) max_num_shape = [1] * len(actual_num.shape) max_num_shape[axis + 1] = -1 max_num = torch.arange(max_num, dtype=torch.int, device=actual_num.device).view(max_num_shape) paddings_indicator = actual_num.int() > max_num return paddings_indicator

def forward(self, batch_dict, **kwargs):

coordinate system notes

# x is pointing forward, y is left right, z is up down
# spconv returns voxel_coords as  [batch_idx, z_idx, y_idx, x_idx], that is why coords is indexed backwards

voxel_features, voxel_num_points, coords = batch_dict['voxels'], batch_dict['voxel_num_points'], batch_dict[
    'voxel_coords']

if not self.use_elevation:  # if we ignore elevation (z) and v_z
    voxel_features[:, :, self.z_ind] = 0  # set z to zero before doing anything

orig_xyz = voxel_features[:, :, :self.z_ind + 1]  # selecting x y z

# calculate mean of points in pillars for x y z and save the offset from the mean
# Note: they do not take the mean directly, as each pillar is filled up with 0-s. Instead, they sum and divide by num of points
points_mean = orig_xyz.sum(dim=1, keepdim=True) / voxel_num_points.type_as(voxel_features).view(-1, 1, 1)
f_cluster = orig_xyz - points_mean  # offset from cluster mean

# calculate center for each pillar and save points' offset from the center. voxel_coordinate * voxel size + offset should be the center of pillar (coords are indexed backwards)
f_center = torch.zeros_like(orig_xyz)
f_center[:, :, 0] = voxel_features[:, :, self.x_ind] - (
            coords[:, 3].to(voxel_features.dtype).unsqueeze(1) * self.voxel_x + self.x_offset)
f_center[:, :, 1] = voxel_features[:, :, self.y_ind] - (
            coords[:, 2].to(voxel_features.dtype).unsqueeze(1) * self.voxel_y + self.y_offset)
f_center[:, :, 2] = voxel_features[:, :, self.z_ind] - (
            coords[:, 1].to(voxel_features.dtype).unsqueeze(1) * self.voxel_z + self.z_offset)

voxel_features = voxel_features[:, :, self.selected_indexes]  # filtering for used features

features = [voxel_features, f_cluster, f_center]

if self.with_distance:  # if with_distance is true, include range to the points as well
    points_dist = torch.norm(orig_xyz, 2, 2, keepdim=True)  # first 2: L2 norm second 2: along 2. dim
    features.append(points_dist)

## finishing up the feature extraction with correct shape and masking
features = torch.cat(features, dim=-1)

voxel_count = features.shape[1]
mask = self.get_paddings_indicator(voxel_num_points, voxel_count, axis=0)
mask = torch.unsqueeze(mask, -1).type_as(voxel_features)
features *= mask

for pfn in self.pfn_layers:
    features = pfn(features)
features = features.squeeze()
batch_dict['pillar_features'] = features
return batch_dict

`

/pcdet/models/backbones_3d/vfe/init.py `from .mean_vfe import MeanVFE from .pillar_vfe import PillarVFE,Radar7PillarVFE from .dynamic_mean_vfe import DynamicMeanVFE from .dynamic_pillar_vfe import DynamicPillarVFE, DynamicPillarVFESimple2D from .image_vfe import ImageVFE from .vfe_template import VFETemplate

all = { 'VFETemplate': VFETemplate, 'MeanVFE': MeanVFE, 'PillarVFE': PillarVFE, 'Radar7PillarVFE': Radar7PillarVFE, 'ImageVFE': ImageVFE, 'DynMeanVFE': DynamicMeanVFE, 'DynPillarVFE': DynamicPillarVFE, 'DynamicPillarVFESimple2D': DynamicPillarVFESimple2D } pcdet/datasets/kitti/kitti_dataset.pydef get_lidar(self, idx): lidar_file = self.root_split_path / 'velodyne' / ('%s.bin' % idx) assert lidar_file.exists() number_of_channels = 7 # ['x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'] points = np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)

replace the list values with statistical values; for x, y, z and time, use 0 and 1 as means and std to avoid normalization

means = [0, 0, 0, 0, 0, 0, 0] # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time' stds = [1, 1, 1, 1, 1, 1, 1] # 'x', 'y', 'z', 'rcs', 'v_r', 'v_r_comp', 'time'

# we then norm the channels
points = (points - means) / stds
return points
# return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, number_of_channels)
# return np.fromfile(str(lidar_file), dtype=np.float32).reshape(-1, 4)`

Can you help me see what is the reason