Closed nxdxml closed 6 months ago
Hi! The answer is yes.
First note that opencood/data_utils/datasets/intermediate_heter_fusion_dataset.py
is not designed for this case. It either load camera or lidar data. In the model part, all opencood/models/heter_model_*.py
code follow this design rationale as well. Thus, this data loader can handle LiDAR-only, camera-only, and heterogeneous settings, but not LiDAR+camera for each agent.
If you want to load camera data and lidar data together for all agents, the straightforward solution is to use opencood/data_utils/datasets/intermediate_fusion_dataset.py
. This dataloader is previously designed for LiDAR-only and camera-only settings, but can also accept both inputs. You can copy and run yaml files from my CoAlign repo to see how these yamls work.
To accept both LiDAR and camera input, You can set input_source: ['lidar', 'camera']
or input_source: ['lidar', 'camera']
in the yaml. And the heter
key is no longer needed because all agents are the same. Then at the model entry, you can access both LiDAR data and camera data for each agent. see the code snippet below.
Just note that you need to write a new model, something like a combination of this and this.
# get all LiDAR and camera data for each agent
def forward(self, data_dict):
# get camera input
image_inputs_dict = data_dict['image_inputs']
record_len = data_dict['record_len']
x, rots, trans, intrins, post_rots, post_trans = \
image_inputs_dict['imgs'], image_inputs_dict['rots'], image_inputs_dict['trans'], image_inputs_dict['intrins'], image_inputs_dict['post_rots'], image_inputs_dict['post_trans']
x, depth_items = self.get_voxels(x, rots, trans, intrins, post_rots, post_trans) # 将图像转换到BEV下
# get lidar input
voxel_features = data_dict['processed_lidar']['voxel_features']
voxel_coords = data_dict['processed_lidar']['voxel_coords']
voxel_num_points = data_dict['processed_lidar']['voxel_num_points']
# process these data
...
name: opv2v_lidar_and_camera
root_dir: "dataset/OPV2V/train"
validate_dir: "dataset/OPV2V/validate"
test_dir: "dataset/OPV2V/test"
yaml_parser: "load_general_params"
train_params:
batch_size: &batch_size 1
epoches: 20
eval_freq: 2
save_freq: 2
max_cav: 5
comm_range: 70
input_source: ['lidar', 'camera', 'depth']
label_type: 'lidar'
cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]
add_data_extension: ['bev_visibility.png']
fusion:
core_method: 'intermediate'
dataset: 'opv2v'
args:
proj_first: false
grid_conf: &grid_conf
xbound: [-51.2, 51.2, 0.4] # Limit the range of the x direction and divide the grids
ybound: [-51.2, 51.2, 0.4] # Limit the range of the y direction and divide the grids
zbound: [-10, 10, 20.0] # Limit the range of the z direction and divide the grids
ddiscr: [2, 50, 48]
mode: 'LID'
data_aug_conf: &data_aug_conf
resize_lim: [0.65, 0.7]
final_dim: [384, 512]
rot_lim: [-3.6, 3.6]
H: 600
W: 800
rand_flip: False
bot_pct_lim: [0.0, 0.05]
cams: ['camera0', 'camera1', 'camera2', 'camera3']
Ncams: 4
data_augment: # no use in intermediate fusion
- NAME: random_world_flip
ALONG_AXIS_LIST: [ 'x' ]
- NAME: random_world_rotation
WORLD_ROT_ANGLE: [ -0.78539816, 0.78539816 ]
- NAME: random_world_scaling
WORLD_SCALE_RANGE: [ 0.95, 1.05 ]
# preprocess-related
preprocess:
# options: BasePreprocessor, VoxelPreprocessor, BevPreprocessor
core_method: 'SpVoxelPreprocessor'
args:
voxel_size: &voxel_size [0.4, 0.4, 4]
max_points_per_voxel: 32
max_voxel_train: 32000
max_voxel_test: 70000
# detection range for each individual cav.
cav_lidar_range: &cav_lidar [-51.2, -51.2, -3, 51.2, 51.2, 1]
# anchor box related
postprocess:
core_method: 'VoxelPostprocessor' # VoxelPostprocessor, BevPostprocessor supported
gt_range: *cav_lidar
anchor_args:
cav_lidar_range: *cav_lidar
l: 3.9
w: 1.6
h: 1.56
r: &anchor_yaw [0, 90]
feature_stride: 2
num: &anchor_num 2
target_args:
pos_threshold: 0.6
neg_threshold: 0.45
score_threshold: 0.2
order: 'hwl' # hwl or lwh
max_num: 150 # maximum number of objects in a single frame. use this number to make sure different frames has the same dimension in the same batch
nms_thresh: 0.15
dir_args: &dir_args
dir_offset: 0.7853
num_bins: 2
anchor_yaw: *anchor_yaw
# model related
model:
core_method: YOUR_LIDAR_CAMERA_MODEL
...
loss:
core_method: point_pillar_depth_loss
args:
pos_cls_weight: 2.0
cls:
type: 'SigmoidFocalLoss'
alpha: 0.25
gamma: 2.0
weight: 1.0
reg:
type: 'WeightedSmoothL1Loss'
sigma: 3.0
codewise: true
weight: 2.0
dir:
type: 'WeightedSoftmaxClassificationLoss'
weight: 0.2
args: *dir_args
depth:
weight: 1.0
optimizer:
core_method: Adam
lr: 0.002
args:
eps: 1e-10
weight_decay: 1e-4
lr_scheduler:
core_method: multistep #step, multistep and Exponential support
gamma: 0.1
step_size: [10, 25]
Thank you for your detailed explanation!
the m1~m4 modality can be modified as m1&m2?
heter: assignment_path: opencood/logs/heter_modality_assign/opv2v_4modality.json ego_modality: m1&m2 mapping_dict: m1: m1 m2: m1 m3: m2 m4: m2