Closed GDMG99 closed 4 months ago
Also, I have been trying to initialize the class BevFusion(Base3DFusionModel)
. The dictionaries that I am using to initialize the class are from the config file (.yaml) of the pretrained weights of the fusion object detection model.
decoder:
backbone:
conv_cfg:
bias: false
type: Conv2d
in_channels: 256
layer_nums:
- 5
- 5
layer_strides:
- 1
- 2
norm_cfg:
eps: 0.001
momentum: 0.01
type: BN
out_channels:
- 128
- 256
type: SECOND
neck:
in_channels:
- 128
- 256
norm_cfg:
eps: 0.001
momentum: 0.01
type: BN
out_channels:
- 256
- 256
type: SECONDFPN
upsample_cfg:
bias: false
type: deconv
upsample_strides:
- 1
- 2
use_conv_for_no_stride: true
encoders:
camera:
backbone:
attn_drop_rate: 0.0
convert_weights: true
depths:
- 2
- 2
- 6
- 2
drop_path_rate: 0.2
drop_rate: 0.0
embed_dims: 96
init_cfg:
checkpoint: pretrained/swint-nuimages-pretrained.pth
type: Pretrained
mlp_ratio: 4
num_heads:
- 3
- 6
- 12
- 24
out_indices:
- 1
- 2
- 3
patch_norm: true
qk_scale: null
qkv_bias: true
type: SwinTransformer
window_size: 7
with_cp: false
neck:
act_cfg:
inplace: true
type: ReLU
in_channels:
- 192
- 384
- 768
norm_cfg:
requires_grad: true
type: BN2d
num_outs: 3
out_channels: 256
start_level: 0
type: GeneralizedLSSFPN
upsample_cfg:
align_corners: false
mode: bilinear
vtransform:
dbound:
- 1.0
- 60.0
- 0.5
downsample: 2
feature_size:
- 32
- 88
image_size:
- 256
- 704
in_channels: 256
out_channels: 80
type: DepthLSSTransform
xbound:
- -54.0
- 54.0
- 0.3
ybound:
- -54.0
- 54.0
- 0.3
zbound:
- -10.0
- 10.0
- 20.0
lidar:
backbone:
block_type: basicblock
encoder_channels:
- - 16
- 16
- 32
- - 32
- 32
- 64
- - 64
- 64
- 128
- - 128
- 128
encoder_paddings:
- - 0
- 0
- 1
- - 0
- 0
- 1
- - 0
- 0
- - 1
- 1
- 0
- - 0
- 0
in_channels: 5
order:
- conv
- norm
- act
output_channels: 128
sparse_shape:
- 1440
- 1440
- 41
type: SparseEncoder
voxelize:
max_num_points: 10
max_voxels:
- 120000
- 160000
point_cloud_range:
- -54.0
- -54.0
- -5.0
- 54.0
- 54.0
- 3.0
voxel_size:
- 0.075
- 0.075
- 0.2
fuser:
in_channels:
- 80
- 256
out_channels: 256
type: ConvFuser
heads:
map: null
object:
activation: relu
auxiliary: true
bbox_coder:
code_size: 10
out_size_factor: 8
pc_range:
- -54.0
- -54.0
post_center_range:
- -61.2
- -61.2
- -10.0
- 61.2
- 61.2
- 10.0
score_threshold: 0.0
type: TransFusionBBoxCoder
voxel_size:
- 0.075
- 0.075
bn_momentum: 0.1
common_heads:
center:
- 2
- 2
dim:
- 3
- 2
height:
- 1
- 2
rot:
- 2
- 2
vel:
- 2
- 2
dropout: 0.1
ffn_channel: 256
hidden_channel: 128
in_channels: 512
loss_bbox:
loss_weight: 0.25
reduction: mean
type: L1Loss
loss_cls:
alpha: 0.25
gamma: 2.0
loss_weight: 1.0
reduction: mean
type: FocalLoss
use_sigmoid: true
loss_heatmap:
loss_weight: 1.0
reduction: mean
type: GaussianFocalLoss
nms_kernel_size: 3
num_classes: 10
num_decoder_layers: 1
num_heads: 8
num_proposals: 200
test_cfg:
dataset: nuScenes
grid_size:
- 1440
- 1440
- 41
nms_type: null
out_size_factor: 8
pc_range:
- -54.0
- -54.0
voxel_size:
- 0.075
- 0.075
train_cfg:
assigner:
cls_cost:
alpha: 0.25
gamma: 2.0
type: FocalLossCost
weight: 0.15
iou_calculator:
coordinate: lidar
type: BboxOverlaps3D
iou_cost:
type: IoU3DCost
weight: 0.25
reg_cost:
type: BBoxBEVL1Cost
weight: 0.25
type: HungarianAssigner3D
code_weights:
- 1.0
- 1.0
- 1.0
- 1.0
- 1.0
- 1.0
- 1.0
- 1.0
- 0.2
- 0.2
dataset: nuScenes
gaussian_overlap: 0.1
grid_size:
- 1440
- 1440
- 41
min_radius: 2
out_size_factor: 8
point_cloud_range:
- -54.0
- -54.0
- -5.0
- 54.0
- 54.0
- 3.0
pos_weight: -1
voxel_size:
- 0.075
- 0.075
- 0.2
type: TransFusionHead
In principle, the class properly loads the encoders, decoder, and fuser. However, with the heads, I get the following error:
AttributeError: TransFusionHead: 'dict' object has no attribute 'assigner'
, yet assigner is in heads['object']['train_cfg']
.
Does anyone know how to properly initialize the class? Also I still do not see how to input the desired weights during the initialization.
Thank you in advance
Adding support for a custom dataset is beyond the scope of this codebase, and unfortunately, we don't have the capacity to accommodate such customized requests.
Hi! Thank you authors for your amazing work! I have been working with BEVFusion for some time and I would like to implement it on a detection pipeline for a custom application. I have already trained and evaluated the model on different nuScenes-like datasets. This time I have a LiDAR and two cameras and I am already working on changing the data format to match the nuScenes one. I have been looking at the
class BEVFusion(Base3DFusionModel):
(here), but it needsencoder
,fuser
,decoder
, andheads
dictionary inputs. Are these dictionaries the same as in the config yaml files? From the BEVFusion class I believe I should use the forward function. Nevertheless, I cannot see how to input the weights of the model or the config file anywhere. Could someone help me out here? Thanks a lot in advance!