open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.35k stars 1.55k forks source link

FCOS3D train on kitti dataset #865

Closed xiaofengWang-CCNU closed 3 years ago

xiaofengWang-CCNU commented 3 years ago

Sorry to bother you. To train FCOS3D on kitti dataset, I did following steps.

  1. write the 'fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py' according to 'fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d.py'.

  2. writer a 'kitti-mono3d.py' in path 'configs/base/datasets' according to 'nus-mono3d.py'.

  3. run python tools/train.py configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py --work-dir ./ckpt --gpu-ids 6

  4. the data are followed the create_data.py.

BUT I get a error :

Traceback (most recent call last): File "tools/train.py", line 223, in main() File "tools/train.py", line 219, in main meta=meta) File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model meta=meta) File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train for i, data_batch in enumerate(self.data_loader): File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 291, in iter return _MultiProcessingDataLoaderIter(self) File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 764, in init self._try_put_index() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 994, in _try_put_index index = self._next_index() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 208, in iter for idx in self.sampler: File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/samplers/group_sampler.py", line 36, in iter indices = np.concatenate(indices) File "<__array_function__ internals>", line 6, in concatenate ValueError: need at least one array to concatenate

I can not find what caused this error, does anyone are doing this ,please help me, think you.

Tai-Wang commented 3 years ago

Please show your config. Besides, if you are not in a big hurry, please stay tuned for our released KITTI model. It is expected to be done by the end of September.

xiaofengWang-CCNU commented 3 years ago

The configs:

1. fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py:

_base_ = [
'../_base_/datasets/kitti-mono3d.py', '../_base_/models/fcos3d.py',
'../_base_/schedules/mmdet_schedule_1x.py', '../_base_/default_runtime.py'

]

model settings

model = dict( backbone=dict( dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True)))

class_names = [ 'Pedestrian', 'Cyclist', 'Car' ]

img_norm_cfg = dict( mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) train_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=True, with_bbox_3d=True, with_label_3d=True, with_bbox_depth=True), dict(type='Resize', img_scale=(1600, 900), keep_ratio=True), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle3D', class_names=class_names), dict( type='Collect3D', keys=[ 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers2d', 'depths' ]), ] test_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='MultiScaleFlipAug', scale_factor=1.0, flip=False, transforms=[ dict(type='RandomFlip3D'), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=class_names, with_label=False), dict(type='Collect3D', keys=['img']), ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict(pipeline=train_pipeline), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline))

optimizer

optimizer = dict( lr=0.002, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) optimizer_config = dict( delete=True, grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[8, 11]) total_epochs = 12 evaluation = dict(interval=2)

2. kitti-mono3d.py:

dataset_type = 'NuScenesMonoDataset'

dataset_type = 'KittiMonoDataset'

data_root = 'data/kitti/'

class_names = [ 'Pedestrian', 'Cyclist', 'Car' ]

Input modality for kitti dataset, this is consistent with the submission

format which requires the information in input_modality.

input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=False) img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='LoadAnnotations3D', with_bbox=True, with_label=True, with_attr_label=True, with_bbox_3d=True, with_label_3d=True, with_bbox_depth=True), dict(type='Resize', img_scale=(1600, 900), keep_ratio=True), dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle3D', class_names=class_names), dict( type='Collect3D', keys=[ 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers2d', 'depths' ]), ] test_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='MultiScaleFlipAug', scale_factor=1.0, flip=False, transforms=[ dict(type='RandomFlip3D'), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict( type='DefaultFormatBundle3D', class_names=class_names, with_label=False), dict(type='Collect3D', keys=['img']), ]) ]

construct a pipeline for data and gt loading in show function

please keep its loading function consistent with test_pipeline (e.g. client)

eval_pipeline = [ dict(type='LoadImageFromFileMono3D'), dict( type='DefaultFormatBundle3D', class_names=class_names, with_label=False), dict(type='Collect3D', keys=['img']) ]

data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_train_mono3d.coco.json', img_prefix=data_root, classes=class_names, pipeline=train_pipeline, modality=input_modality, test_mode=False, box_type_3d='Camera'), val=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_val_mono3d.coco.json', img_prefix=data_root, classes=class_names, pipeline=test_pipeline, modality=input_modality, test_mode=True, box_type_3d='Camera'), test=dict( type=dataset_type, data_root=data_root, ann_file=data_root + 'kitti_infos_val_mono3d.coco.json', img_prefix=data_root, classes=class_names, pipeline=test_pipeline, modality=input_modality, test_mode=True, box_type_3d='Camera')) evaluation = dict(interval=2)

up here is the config file.

And If i set the 'datase_type'= 'KittiMonoDataset' , there will be another error: KittiMonoDataset: init() missing 1 required positional argument: 'info_file' But i can not find which info_file to use

Tai-Wang commented 3 years ago

Please use KittiMonoDataset and set info_file the same as LiDAR-based methods (use the .pkl files). You also need to adjust those dataset-specific parameters such as with_attr_label and img_scale, etc.

xiaofengWang-CCNU commented 3 years ago

Think you very much for your answer, and i modified it as your suggestted , the following question are happend:

Traceback (most recent call last): File "tools/train.py", line 223, in main() File "tools/train.py", line 219, in main meta=meta) File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model meta=meta) File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train for i, data_batch in enumerate(self.data_loader): File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in next data = self._next_data() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data return self._process_data(data) File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data data.reraise() File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 194, in getitem data = self.prepare_train_img(idx) File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 217, in prepare_train_img return self.pipeline(results) File "/opt/conda/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 40, in call data = t(data) File "/mmdetection3d/mmdet3d/datasets/pipelines/formating.py", line 164, in call data[key] = results[key] KeyError: 'attr_labels'

The keys=[ 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 'gt_labels_3d', 'centers2d', 'depths' ] in train_pipeline should be modified, But where the keys come from

Tai-Wang commented 3 years ago

The keys are recorded after several data preprocessing of the overall training pipeline. Similarly to removing the with_attr_label, you need to remove attr_labels from keys.

xiaofengWang-CCNU commented 3 years ago

Think you for your answer, I have removed the attr_labels, it seems that i have set a wrong data size ,i have tried every possible size, but it still have the following question:

Traceback (most recent call last):
  File "tools/train.py", line 223, in <module>
    main()
  File "tools/train.py", line 219, in main
    meta=meta)
  File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mmdetection3d/mmdet3d/models/detectors/single_stage_mono3d.py", line 67, in forward_train
    attr_labels, gt_bboxes_ignore)
  File "/mmdetection3d/mmdet3d/models/dense_heads/base_mono3d_dense_head.py", line 71, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 184, in new_func
    return old_func(*args, **kwargs)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 309, in loss
    gt_labels_3d, centers2d, depths, attr_labels)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 801, in get_targets
    num_points_per_lvl=num_points)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/core/utils/misc.py", line 29, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 876, in _get_target_single
    self.bbox_code_size)
RuntimeError: The expanded size of the tensor (9) must match the existing size (7) at non-singleton dimension 2.  Target sizes: [9978, 4, 9].  Tensor sizes: [1, 4, 7]

The operation of expendget a wrong parameter, I am very confused about it, please help me ,think you.

xiaofengWang-CCNU commented 3 years ago

I have set the self.bbox_code_size = 7, but what img_scale should be set to?

Tai-Wang commented 3 years ago

Should be (1242, 375) for KITTI images.

xiaofengWang-CCNU commented 3 years ago

Think you very much for your help, i have set the img_scale=(1242,375), and an unexpected error happend:

Traceback (most recent call last):
  File "tools/train.py", line 223, in <module>
    main()
  File "tools/train.py", line 219, in main
    meta=meta)
  File "/mmdetection3d/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 237, in train_step
    losses = self(**data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
    return old_func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 171, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/mmdetection3d/mmdet3d/models/detectors/single_stage_mono3d.py", line 67, in forward_train
    attr_labels, gt_bboxes_ignore)
  File "/mmdetection3d/mmdet3d/models/dense_heads/base_mono3d_dense_head.py", line 71, in forward_train
    losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 184, in new_func
    return old_func(*args, **kwargs)
  File "/mmdetection3d/mmdet3d/models/dense_heads/fcos_mono3d_head.py", line 411, in loss
    avg_factor=equal_weights.sum())
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/smooth_l1_loss.py", line 97, in forward
    **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/parrots_jit.py", line 21, in wrapper_inner
    return func(*args, **kargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/utils.py", line 96, in wrapper
    loss = loss_func(pred, target, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/models/losses/smooth_l1_loss.py", line 25, in smooth_l1_loss
    assert pred.size() == target.size() and target.numel() > 0
AssertionError

The pred.sizeand target.size is:

torch.Size([63, 2]) torch.Size([63, 2])
torch.Size([63]) torch.Size([63])
torch.Size([63, 3]) torch.Size([63, 3])
torch.Size([63]) torch.Size([63])
torch.Size([63, 2]) torch.Size([63, 0])

I do not know what caused this error, Is there any other KITTI specific parameters should be adjusted?

To solve this error, I just set pred_velo=False and pred_attrs=False, i am not sure if that is right.

the class_names for KITTI is following, is this right?

class_names = [
     'Car', 'Van', 'Truck', 'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram', 'Misc'
 ]

Setting as above, there is a key error when eval time, so I modified the class_to_name and class_to_rangeas follow:

    class_to_name = {
        0: 'Car',
        1: 'Pedestrian',
        2: 'Cyclist',
        3: 'Van',
        4: 'Person_sitting',
        5: 'Truck',
        6: 'Misc',
        7: 'Tram',
    }
    class_to_range = {
        0: [0.5, 0.95, 10],
        1: [0.25, 0.7, 10],
        2: [0.25, 0.7, 10],
        3: [0.5, 0.95, 10],
        4: [0.25, 0.7, 10],
        5: [0.25, 0.7, 10],
        6: [0.5, 0.95, 10],
        7: [0.25, 0.7, 10],

I wonder if this is right

Tai-Wang commented 3 years ago

The class_names should be ['Car', 'Pedestrian', 'Cyclist'] because the mainstream 3D detection setting only supports the evaluation of these classes (with enough samples).

xiaofengWang-CCNU commented 3 years ago

Think you very much for your help, I have trained FCOS3D on KITTI dataset ,the configs are as follow:

fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_kitti-mono3d.py

_base_ = [
    '../_base_/datasets/kitti-mono3d.py', '../_base_/models/fcos3d.py',
    '../_base_/schedules/mmdet_schedule_1x.py', '../_base_/default_runtime.py'
]
# model settings
model = dict(
    backbone=dict(
        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
        stage_with_dcn=(False, False, True, True)))

class_names = [
    'Pedestrian', 'Cyclist', 'Car'
]

img_norm_cfg = dict(
    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='LoadAnnotations3D',
        with_bbox=True,
        with_label=True,
        #with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True,
        with_bbox_depth=True),
    dict(type='Resize', img_scale=(1242,375), keep_ratio=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(
        type='Collect3D',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d',
            'gt_labels_3d', 'centers2d', 'depths'
        ]),
]
test_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='MultiScaleFlipAug',
        scale_factor=1.0,
        flip=False,
        transforms=[
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(pipeline=train_pipeline),
    val=dict(pipeline=test_pipeline),
    test=dict(pipeline=test_pipeline))
# optimizer
optimizer = dict(
    lr=0.002, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(
    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
total_epochs = 24
evaluation = dict(interval=2)
kitti-mono3d.py 

dataset_type = 'KittiMonoDataset'
data_root = 'data/kitti/'

class_names = [
    'Pedestrian', 'Cyclist', 'Car'
]

# Input modality for kitti dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False)
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='LoadAnnotations3D',
        with_bbox=True,
        with_label=True,
        #with_attr_label=False,
        with_bbox_3d=True,
        with_label_3d=True,
        with_bbox_depth=True),
    dict(type='Resize', img_scale=(1242,375), keep_ratio=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(
        type='Collect3D',
        keys=[
            'img', 'gt_bboxes', 'gt_labels', 'gt_bboxes_3d',
            'gt_labels_3d', 'centers2d', 'depths'
        ]),
]
test_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='MultiScaleFlipAug',
        scale_factor=1.0,
        flip=False,
        transforms=[
            dict(type='RandomFlip3D'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['img']),
        ])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
    dict(type='LoadImageFromFileMono3D'),
    dict(
        type='DefaultFormatBundle3D',
        class_names=class_names,
        with_label=False),
    dict(type='Collect3D', keys=['img'])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_train_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_train.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=train_pipeline,
        modality=input_modality,
        test_mode=False,
        box_type_3d='Camera'),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_val.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        box_type_3d='Camera'),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'kitti_infos_val_mono3d.coco.json',
        info_file=data_root + 'kitti_infos_val.pkl',
        img_prefix=data_root,
        classes=class_names,
        pipeline=test_pipeline,
        modality=input_modality,
        test_mode=True,
        box_type_3d='Camera'))
evaluation = dict(interval=2)
fcos3d.py

model = dict(
    type='FCOSMono3D',
    pretrained='open-mmlab://detectron2/resnet101_caffe',
    backbone=dict(
        type='ResNet',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_output',
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='FCOSMono3DHead',
        num_classes=3,
        in_channels=256,
        stacked_convs=2,
        feat_channels=256,
        use_direction_classifier=True,
        diff_rad_by_sin=True,
        pred_attrs=False,
        pred_velo=False,
        dir_offset=0.7854,  # pi/4
        strides=[8, 16, 32, 64, 128],
        group_reg_dims=(2, 1, 3, 1, 2),  # offset, depth, size, rot, velo
        cls_branch=(256, ),
        reg_branch=(
            (256, ),  # offset
            (256, ),  # depth
            (256, ),  # size
            (256, ),  # rot
            ()  # velo
        ),
        dir_branch=(256, ),
        attr_branch=(256, ),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_attr=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_centerness=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        norm_on_bbox=True,
        centerness_on_reg=True,
        center_sampling=True,
        conv_bias=True,
        dcn_on_last_conv=True),
    train_cfg=dict(
        allowed_border=0,
        code_weight=[1.0, 1.0, 0.2, 1.0, 1.0, 1.0, 1.0, 0.05, 0.05],
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        use_rotate_nms=True,
        nms_across_levels=False,
        nms_pre=1000,
        nms_thr=0.8,
        score_thr=0.05,
        min_bbox_size=0,
        max_per_img=200))

Also we need to modify the bbox_code_size=7onanchor_free_mono3d_head.py

The result are as follow(24 epoch): mono3d_result

I have run the mono_det_demo.py on nusences dataset, the result are as follow: n015-2018-07-24-11-22-45+0800__CAM_BACK__1532402927637525_pred1

If you are doing this, please let me know, let's make this work perfect.

likegogogo commented 3 years ago

@xiaofengWang-CCNU had you train fcos3d in waymo dataset? as waymo dataset can be converted to kitti.

Tai-Wang commented 3 years ago

Hi all, thanks for your interest!

We have got an updated version of FCOS3D (FCOS3D++ or PGD) with #964 and #1014 supported on KITTI. You can refer to that config and implementation for more insights. Some hyperparameters of the baseline (FCOS3D) are basically fine-tuned but I believe there is still space for better performance. Hope you can make further progress!

Tai-Wang commented 3 years ago

We are working on a more extensive study based on FCOS3D and PGD on different datasets. Just close this issue temporarily. We will update related information on the homepage if there is any progress. Please stay tuned.

BJLZ123 commented 2 years ago

@xiaofengWang-CCNU Could you leave your email to me, I am also using fcos3d in kitti, hope to learn from you。 My e-mail is 1778586311@qq.com。thank you~

YinengXiong commented 2 years ago

@xiaofengWang-CCNU Could you leave your email to me, I am also using fcos3d in KITTI, but I can't get a similar result with your config file, My email is 853560060@qq.com. hope to learn from you, thanks a lot!

abhi1kumar commented 2 years ago

If you are doing this, please let me know, let's make this work perfect.

Your config does not reproduce the AP2D closer to 70. We have to train it with batch size= 12 on a single GPU to get AP2D Mod Car 0.7 closer to 70%.

data = dict(
    samples_per_gpu=12,
    workers_per_gpu=12
)
abhi1kumar commented 2 years ago

We are working on a more extensive study based on FCOS3D and PGD on different datasets. Just close this issue temporarily. We will update related information on the homepage if there is any progress. Please stay tuned.

Hi @Tai-Wang , Thank you for releasing your nuscenes configs of FCOS3D. Table 1 of your PGD paper also reports the FCOS3D results on the KITTI dataset with AP11 metric. Would it be possible for you to add the FCOS3D KITTI config to the mmdetection3d library?

PS - I tried the kitti_run_13.py.txt config for the FCOS3D on KITTI. The KITTI results are as follows (I could not reproduce the exact FCOS3D results as mentioned in Table 1 of PGD):

----------- AP11 Results ------------

Pedestrian AP11@0.50, 0.50, 0.50:
bbox AP11:48.7265, 44.4238, 40.3403
bev  AP11:3.7565, 3.1921, 2.6185
3d   AP11:3.0281, 2.1568, 2.0752
aos  AP11:35.20, 31.88, 28.86
Pedestrian AP11@0.50, 0.25, 0.25:
bbox AP11:48.7265, 44.4238, 40.3403
bev  AP11:15.2305, 13.2454, 11.8222
3d   AP11:14.6855, 12.6808, 11.2241
aos  AP11:35.20, 31.88, 28.86
Cyclist AP11@0.50, 0.50, 0.50:
bbox AP11:40.4218, 29.6994, 28.6308
bev  AP11:2.6796, 1.5958, 1.5836
3d   AP11:1.8958, 1.2950, 1.2330
aos  AP11:26.26, 19.90, 19.12
Cyclist AP11@0.50, 0.25, 0.25:
bbox AP11:40.4218, 29.6994, 28.6308
bev  AP11:13.3322, 8.0502, 7.3994
3d   AP11:12.7632, 7.0859, 7.0180
aos  AP11:26.26, 19.90, 19.12
Car AP11@0.70, 0.70, 0.70:
bbox AP11:71.5747, 65.0664, 58.6049
bev  AP11:13.6629, 9.4923, 8.6624
3d   AP11:9.6028, 6.3318, 5.8389
aos  AP11:69.96, 63.08, 56.13
Car AP11@0.70, 0.50, 0.50:
bbox AP11:71.5747, 65.0664, 58.6049
bev  AP11:32.6482, 23.5753, 22.5470
3d   AP11:28.7454, 20.1327, 19.1243
aos  AP11:69.96, 63.08, 56.13

Overall AP11@easy, moderate, hard:
bbox AP11:53.5743, 46.3966, 42.5253
bev  AP11:6.6997, 4.7601, 4.2882
3d   AP11:4.8422, 3.2612, 3.0490
aos  AP11:43.81, 38.29, 34.71

----------- AP40 Results ------------

Pedestrian AP40@0.50, 0.50, 0.50:
bbox AP40:47.3424, 42.3251, 38.3909
bev  AP40:3.0132, 2.5833, 2.1692
3d   AP40:2.2745, 1.8029, 1.5599
aos  AP40:32.24, 27.95, 25.11
Pedestrian AP40@0.50, 0.25, 0.25:
bbox AP40:47.3424, 42.3251, 38.3909
bev  AP40:13.6192, 11.8712, 10.1345
3d   AP40:13.0446, 11.2606, 9.6154
aos  AP40:32.24, 27.95, 25.11
Cyclist AP40@0.50, 0.50, 0.50:
bbox AP40:39.7180, 26.7853, 25.8877
bev  AP40:2.2422, 1.2011, 1.1086
3d   AP40:1.4964, 0.8123, 0.7267
aos  AP40:26.13, 18.61, 17.95
Cyclist AP40@0.50, 0.25, 0.25:
bbox AP40:39.7180, 26.7853, 25.8877
bev  AP40:11.7421, 6.6859, 6.1926
3d   AP40:11.2264, 6.1054, 5.7610
aos  AP40:26.13, 18.61, 17.95
Car AP40@0.70, 0.70, 0.70:
bbox AP40:72.8897, 65.7473, 58.7460
bev  AP40:11.0352, 7.9578, 7.2419
3d   AP40:6.3220, 4.2078, 3.7063
aos  AP40:71.19, 63.68, 56.11
Car AP40@0.70, 0.50, 0.50:
bbox AP40:72.8897, 65.7473, 58.7460
bev  AP40:32.1019, 23.0358, 21.6403
3d   AP40:27.9831, 19.6851, 18.4440
aos  AP40:71.19, 63.68, 56.11

Overall AP40@easy, moderate, hard:
bbox AP40:53.3167, 44.9526, 41.0082
bev  AP40:5.4302, 3.9141, 3.5066
3d   AP40:3.3643, 2.2743, 1.9977
aos  AP40:43.19, 36.75, 33.06
DongkyuYu commented 1 year ago

Hi @Tai-Wang! Thank you for your efforts to share the PGD embodiment! I have some confusions at your config file configs/pgd/pgd_r101_caffe_fpn_gn-head_3x4_4x_kitti-mono3d.py. Why is the pred_keypoints option set true when the nuScenes experiments and the original paper didn't predict keypoints. Is it just for get more performance? And it seems that at test time keypoints prediction didn't affect to bbox predictions, isn't it?