haoy945 / DeMF

Boosting 3D Object Detection via Object-Focused Image Fusion
MIT License
56 stars 7 forks source link

Cannot reproduce the results with pretrained demf model that is provided #8

Open guthasaibharathchandra opened 1 year ago

guthasaibharathchandra commented 1 year ago

Hi, I have used the pre-trained model you have provided and the following script to evaluate the model on SUNRGBD and I get the following result which is different from the expected outcome of 65.5mAP@0.25 and 46.1mAP@0.5 as reported in the paper. I'm using the demf with votenet backbone, and the script/command I have used is as follows: (single gpu test)

python eval.py configs/demf/demf_votenet.py pretrained_models/demf-epoch_36.pth --eval mAP

I have downloaded the provided demf-epoch_36.pth , could you please tell me if i'm missing something ? the state_dict of demf-epoch_36.pth contains the weights for entire model right ? i.e for img_backbone and pts_backbone both. +-------------+---------+---------+---------+---------+ | classes | AP_0.25 | AR_0.25 | AP_0.50 | AR_0.50 | +-------------+---------+---------+---------+---------+ | bed | 0.8614 | 0.9437 | 0.3699 | 0.5786 | | table | 0.4739 | 0.8267 | 0.1377 | 0.3454 | | sofa | 0.6361 | 0.8756 | 0.0765 | 0.3190 | | chair | 0.8093 | 0.8994 | 0.6202 | 0.7177 | | toilet | 0.9132 | 0.9862 | 0.3488 | 0.5517 | | desk | 0.2490 | 0.7630 | 0.0271 | 0.2301 | | dresser | 0.4240 | 0.8119 | 0.0467 | 0.2339 | | night_stand | 0.6688 | 0.8902 | 0.4241 | 0.6392 | | bookshelf | 0.1172 | 0.4539 | 0.0065 | 0.0603 | | bathtub | 0.7807 | 0.8980 | 0.1729 | 0.4286 | +-------------+---------+---------+---------+---------+ | Overall | 0.5934 | 0.8349 | 0.2230 | 0.4104 | +-------------+---------+---------+---------+---------+

chenshi3 commented 1 year ago

I train the fcaf-based model, which can achieve the reported results. Maybe you can try fcaf-based model.

guthasaibharathchandra commented 1 year ago

Hi, mmdet3d later versions (i,e > 1.0.0), keeps all points in point clouds of sunrgbd dataset when processing. While the older mmdet3d versions sample only 50000 points. (you can see the NOTE in readme at https://github.com/open-mmlab/mmdetection3d/tree/master/data/sunrgbd) The above results I got were from the new version which keeps all the points. I evaluated on sunrgbd generated from old mmdet3d version and was able to reproduce the results you mentioned. I noticed however that in the train_pipeline you are sampling only 20000 points so ideally there shouldn't be much difference but seems like its not the case. May be worth investigating so i'm just sharing this here!

chenshi3 commented 1 year ago

The aixs of mmdet3d later versions (i,e > 1.0.0) is different from ours. You should be careful, and this may cause the problem.

guthasaibharathchandra commented 1 year ago

I'm only using the sunrgbd data generated using mmdet3d > 1.0.0 and using your specified versions of mmdet3d to run the pretrained model on it. I think sunrgbd point clouds are in depth co-ordinate system by default irrespective of mmdet3d versions. Am i missing something?

chenshi3 commented 1 year ago

In experiments, we generate the SUNRGB dataset with 100000 points. I don't think the number of points is main reason. As to the coordinate system, I check the code and have not found clues. I recommend you to use mmdet3d below version 1.0.

LIZECHUAN commented 10 months ago

I train the fcaf-based model, which can achieve the reported results. Maybe you can try fcaf-based model.

Hi , I train the fcaf-based model and I get the following result which is different from the reported mAp in the paper.could you please tell me if i'm missing something ? +-------------+---------+---------+---------+---------+ | classes | AP_0.25 | AR_0.25 | AP_0.50 | AR_0.50 | +-------------+---------+---------+---------+---------+ | bed | 0.8811 | 0.9767 | 0.6398 | 0.7359 | | table | 0.4980 | 0.9059 | 0.2817 | 0.5988 | | sofa | 0.7217 | 0.9490 | 0.5002 | 0.7161 | | chair | 0.8150 | 0.9016 | 0.6704 | 0.7695 | | toilet | 0.9287 | 0.9862 | 0.7106 | 0.8000 | | desk | 0.3208 | 0.8379 | 0.0992 | 0.4299 | | dresser | 0.4735 | 0.8991 | 0.2514 | 0.5963 | | night_stand | 0.7013 | 0.9490 | 0.5532 | 0.7569 | | bookshelf | 0.2982 | 0.7340 | 0.0587 | 0.2305 | | bathtub | 0.8098 | 0.9592 | 0.4944 | 0.6939 | +-------------+---------+---------+---------+---------+ | Overall | 0.6448 | 0.9099 | 0.4259 | 0.6328 | +-------------+---------+---------+---------+---------+

n_points = 100000 dataset_type = 'SUNRGBDDataset' data_root = '/home/hy/ssd1/lzc/DeMF/sunrgbd/' class_names = ('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub') train_pipeline = [ dict( type='LoadPointsFromFile', coord_type='DEPTH', shift_height=False, load_dim=6, use_dim=[0, 1, 2, 3, 4, 5]), dict(type='LoadImageFromFile'), dict(type='LoadAnnotations3D'), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='LoadAnnotations', with_bbox=True), dict(type='IndoorPointSample', num_points=100000), dict(type='RandomFlip3D', sync_2d=False, flip_ratio_bev_horizontal=0.5), dict( type='GlobalRotScaleTrans', rot_range=[-0.523599, 0.523599], scale_ratio_range=[0.85, 1.15], translation_std=[0.1, 0.1, 0.1], shift_height=False), dict( type='DefaultFormatBundle3D', class_names=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub')), dict( type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d', 'img']) ] test_pipeline = [ dict( type='LoadPointsFromFile', coord_type='DEPTH', shift_height=False, load_dim=6, use_dim=[0, 1, 2, 3, 4, 5]), dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict( type='GlobalRotScaleTrans', rot_range=[0, 0], scale_ratio_range=[1.0, 1.0], translation_std=[0, 0, 0]), dict( type='RandomFlip3D', sync_2d=False, flip_ratio_bev_horizontal=0.5, flip_ratio_bev_vertical=0.5), dict(type='IndoorPointSample', num_points=100000), dict( type='DefaultFormatBundle3D', class_names=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub'), with_label=False), dict(type='Collect3D', keys=['points', 'img']) ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=4, train=dict( type='RepeatDataset', times=3, dataset=dict( type='SUNRGBDDataset', modality=dict(use_camera=True, use_lidar=True), data_root='/home/hy/ssd1/lzc/DeMF/sunrgbd/', ann_file='/home/hy/ssd1/lzc/DeMF/sunrgbd/sunrgbd_infos_train.pkl', pipeline=[ dict( type='LoadPointsFromFile', coord_type='DEPTH', shift_height=False, load_dim=6, use_dim=[0, 1, 2, 3, 4, 5]), dict(type='LoadImageFromFile'), dict(type='LoadAnnotations3D'), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='LoadAnnotations', with_bbox=True), dict(type='IndoorPointSample', num_points=100000), dict( type='RandomFlip3D', sync_2d=False, flip_ratio_bev_horizontal=0.5), dict( type='GlobalRotScaleTrans', rot_range=[-0.523599, 0.523599], scale_ratio_range=[0.85, 1.15], translation_std=[0.1, 0.1, 0.1], shift_height=False), dict( type='DefaultFormatBundle3D', class_names=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub')), dict( type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d', 'img']) ], filter_empty_gt=True, classes=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub'), box_type_3d='Depth')), val=dict( type='SUNRGBDDataset', modality=dict(use_camera=True, use_lidar=True), data_root='/home/hy/ssd1/lzc/DeMF/sunrgbd/', ann_file='/home/hy/ssd1/lzc/DeMF/sunrgbd/sunrgbd_infos_val.pkl', pipeline=[ dict( type='LoadPointsFromFile', coord_type='DEPTH', shift_height=False, load_dim=6, use_dim=[0, 1, 2, 3, 4, 5]), dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict( type='GlobalRotScaleTrans', rot_range=[0, 0], scale_ratio_range=[1.0, 1.0], translation_std=[0, 0, 0]), dict( type='RandomFlip3D', sync_2d=False, flip_ratio_bev_horizontal=0.5, flip_ratio_bev_vertical=0.5), dict(type='IndoorPointSample', num_points=100000), dict( type='DefaultFormatBundle3D', class_names=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub'), with_label=False), dict(type='Collect3D', keys=['points', 'img']) ]) ], classes=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub'), test_mode=True, box_type_3d='Depth'), test=dict( type='SUNRGBDDataset', modality=dict(use_camera=True, use_lidar=True), data_root='/home/hy/ssd1/lzc/DeMF/sunrgbd/', ann_file='/home/hy/ssd1/lzc/DeMF/sunrgbd/sunrgbd_infos_val.pkl', pipeline=[ dict( type='LoadPointsFromFile', coord_type='DEPTH', shift_height=False, load_dim=6, use_dim=[0, 1, 2, 3, 4, 5]), dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict( type='GlobalRotScaleTrans', rot_range=[0, 0], scale_ratio_range=[1.0, 1.0], translation_std=[0, 0, 0]), dict( type='RandomFlip3D', sync_2d=False, flip_ratio_bev_horizontal=0.5, flip_ratio_bev_vertical=0.5), dict(type='IndoorPointSample', num_points=100000), dict( type='DefaultFormatBundle3D', class_names=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub'), with_label=False), dict(type='Collect3D', keys=['points', 'img']) ]) ], classes=('bed', 'table', 'sofa', 'chair', 'toilet', 'desk', 'dresser', 'night_stand', 'bookshelf', 'bathtub'), test_mode=True, box_type_3d='Depth')) voxel_size = 0.01 model = dict( type='TwoStageSparse3DDetector', voxel_size=0.01, backbone=dict(type='MEResNet3D', in_channels=3, depth=34), neck_with_head=dict( type='Fcaf3DNeckWithHead_my', in_channels=(64, 128, 256, 512), out_channels=128, pts_threshold=100000, n_classes=10, n_reg_outs=8, voxel_size=0.01, assigner=dict(type='Fcaf3DAssigner', limit=27, topk=18, n_scales=4), loss_bbox=dict(type='IoU3DLoss', loss_weight=1.0)), train_cfg=dict(), test_cfg=dict( nms_pre=1000, iou_thr=0.5, score_thr=0.01, ensemble_stages=[2]), img_encoder=dict( type='DeformableDetrEncoder', encoder=dict( type='DetrTransformerEncoder', num_layers=6, transformerlayers=dict( type='BaseTransformerLayer', attn_cfgs=dict( type='MultiScaleDeformableAttention', embed_dims=256), feedforward_channels=1024, ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'ffn', 'norm'))), positional_encoding=dict( type='SinePositionalEncoding', num_feats=128, normalize=True, offset=-0.5), num_feature_levels=4, embed_dims=256), img_backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='pytorch'), img_neck=dict( type='ChannelMapper', in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, norm_cfg=dict(type='GN', num_groups=32), num_outs=4), stage2_head=dict( type='CAHeadIter', decoder=dict( type='TransformerDecoderLayerWithPos', num_layers=1, transformerlayers=dict( type='DetrTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1), dict(type='MultiScaleDeformableAttention', embed_dims=256) ], feedforward_channels=1024, ffn_dropout=0.1, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')), posembed=dict(input_channel=9, num_pos_feats=256))), freeze_img_branch=True) find_unused_parameters = True optimizer = dict( type='AdamW', lr=0.001, weight_decay=0.0001, paramwise_cfg=dict( custom_keys=dict(decoder=dict(lr_mult=0.05, decay_mult=1.0)))) optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2)) lr_config = dict(policy='step', warmup=None, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) custom_hooks = [dict(type='EmptyCacheHook', after_iter=True)] checkpoint_config = dict(interval=1, max_keep_ckpts=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = '1105/raw/base' load_from = '/home/hy/ssd1/lzc/DeMF/deform_detr-epoch_10.pth' resume_from = None workflow = [('train', 1)] lr = 0.001 img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) evaluation = dict(interval=1) gpu_ids = range(0, 4)