yifanlu0227 / HEAL

[ICLR2024] HEAL: An Extensible Framework for Open Heterogeneous Collaborative Perception ➡️ All You Need for Multi-Modality Collaborative Perception!
Other
147 stars 9 forks source link

Does this framework allow the agent to provide both point cloud and image data simultaneously for training? If so, how should the YAML file be modified? #9

Closed nxdxml closed 6 months ago

nxdxml commented 6 months ago

the m1~m4 modality can be modified as m1&m2?

heter: assignment_path: opencood/logs/heter_modality_assign/opv2v_4modality.json ego_modality: m1&m2 mapping_dict: m1: m1 m2: m1 m3: m2 m4: m2

yifanlu0227 commented 6 months ago

Hi! The answer is yes.

First note that opencood/data_utils/datasets/intermediate_heter_fusion_dataset.py is not designed for this case. It either load camera or lidar data. In the model part, all opencood/models/heter_model_*.py code follow this design rationale as well. Thus, this data loader can handle LiDAR-only, camera-only, and heterogeneous settings, but not LiDAR+camera for each agent.

If you want to load camera data and lidar data together for all agents, the straightforward solution is to use opencood/data_utils/datasets/intermediate_fusion_dataset.py. This dataloader is previously designed for LiDAR-only and camera-only settings, but can also accept both inputs. You can copy and run yaml files from my CoAlign repo to see how these yamls work.

To accept both LiDAR and camera input, You can set input_source: ['lidar', 'camera'] or input_source: ['lidar', 'camera'] in the yaml. And the heter key is no longer needed because all agents are the same. Then at the model entry, you can access both LiDAR data and camera data for each agent. see the code snippet below.

Just note that you need to write a new model, something like a combination of this and this.

    # get all LiDAR and camera data for each agent
    def forward(self, data_dict):

        # get camera input
        image_inputs_dict = data_dict['image_inputs']
        record_len = data_dict['record_len']
        x, rots, trans, intrins, post_rots, post_trans = \
            image_inputs_dict['imgs'], image_inputs_dict['rots'], image_inputs_dict['trans'], image_inputs_dict['intrins'], image_inputs_dict['post_rots'], image_inputs_dict['post_trans']
        x, depth_items = self.get_voxels(x, rots, trans, intrins, post_rots, post_trans)  # 将图像转换到BEV下

        # get lidar input 
        voxel_features = data_dict['processed_lidar']['voxel_features']
        voxel_coords = data_dict['processed_lidar']['voxel_coords']
        voxel_num_points = data_dict['processed_lidar']['voxel_num_points']

        # process these data 
        ...

A not complete yaml example

name: opv2v_lidar_and_camera
root_dir: "dataset/OPV2V/train"
validate_dir: "dataset/OPV2V/validate"
test_dir: "dataset/OPV2V/test"

yaml_parser: "load_general_params"
train_params:
  batch_size: &batch_size 1
  epoches: 20
  eval_freq: 2
  save_freq: 2
  max_cav: 5

comm_range: 70
input_source: ['lidar', 'camera', 'depth']
label_type: 'lidar'
cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]

add_data_extension: ['bev_visibility.png']

fusion:
  core_method: 'intermediate'
  dataset: 'opv2v'
  args: 
    proj_first: false
    grid_conf: &grid_conf
      xbound: [-51.2, 51.2, 0.4]   # Limit the range of the x direction and divide the grids
      ybound: [-51.2, 51.2, 0.4]   # Limit the range of the y direction and divide the grids
      zbound: [-10, 10, 20.0]   # Limit the range of the z direction and divide the grids
      ddiscr: [2, 50, 48]
      mode: 'LID'
    data_aug_conf: &data_aug_conf
      resize_lim: [0.65, 0.7]
      final_dim: [384, 512]
      rot_lim: [-3.6, 3.6]
      H: 600
      W: 800
      rand_flip: False
      bot_pct_lim: [0.0, 0.05]
      cams: ['camera0', 'camera1', 'camera2', 'camera3']
      Ncams: 4

data_augment: # no use in intermediate fusion
  - NAME: random_world_flip
    ALONG_AXIS_LIST: [ 'x' ]

  - NAME: random_world_rotation
    WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]

  - NAME: random_world_scaling
    WORLD_SCALE_RANGE: [ 0.95, 1.05 ]

# preprocess-related
preprocess:
  # options: BasePreprocessor, VoxelPreprocessor, BevPreprocessor
  core_method: 'SpVoxelPreprocessor'
  args:
    voxel_size: &voxel_size [0.4, 0.4, 4]
    max_points_per_voxel: 32
    max_voxel_train: 32000
    max_voxel_test: 70000
  # detection range for each individual cav.
  cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]

# anchor box related
postprocess:
  core_method: 'VoxelPostprocessor' # VoxelPostprocessor, BevPostprocessor supported
  gt_range: *cav_lidar
  anchor_args:
    cav_lidar_range: *cav_lidar
    l: 3.9
    w: 1.6
    h: 1.56
    r: &anchor_yaw [0, 90]
    feature_stride: 2
    num: &anchor_num 2
  target_args:
    pos_threshold: 0.6
    neg_threshold: 0.45
    score_threshold: 0.2
  order: 'hwl' # hwl or lwh
  max_num: 150 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
  nms_thresh: 0.15
  dir_args: &dir_args
    dir_offset: 0.7853
    num_bins: 2
    anchor_yaw: *anchor_yaw

# model related
model:
  core_method: YOUR_LIDAR_CAMERA_MODEL
  ...

loss:
  core_method: point_pillar_depth_loss
  args:
    pos_cls_weight: 2.0
    cls:
      type: 'SigmoidFocalLoss'
      alpha: 0.25
      gamma: 2.0
      weight: 1.0
    reg:
      type: 'WeightedSmoothL1Loss'
      sigma: 3.0
      codewise: true
      weight: 2.0
    dir:
      type: 'WeightedSoftmaxClassificationLoss'
      weight: 0.2
      args: *dir_args
    depth:
      weight: 1.0

optimizer:
  core_method: Adam
  lr: 0.002
  args:
    eps: 1e-10
    weight_decay: 1e-4

lr_scheduler:
  core_method: multistep #step, multistep and Exponential support
  gamma: 0.1
  step_size: [10, 25]
nxdxml commented 6 months ago

Thank you for your detailed explanation!