Open CDchenlin opened 10 months ago
I have made following modifications, while errors remian.
model = dict( num_feature_levels=5, # added language_model=dict(name=lang_model_name), backbone=dict( out_indices=(0, 1, 2, 3), # modified from out_indices=( 1, 2, 3) with_cp=False), neck=dict( in_channels=[96, 192, 384, 768], # modified from in_channels=[192, 384, 768] num_outs=5), # modified from num_outs=4 bbox_head=dict(num_classes=13), encoder=dict( num_layers=6, num_cp=6, # visual layer config layer_cfg=dict( self_attn_cfg=dict(embed_dims=256, num_levels=5, dropout=0.0), # modified from num_levels=4 ffn_cfg=dict( embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)), # text layer config text_layer_cfg=dict( self_attn_cfg=dict(num_heads=4, embed_dims=256, dropout=0.0), ffn_cfg=dict( embed_dims=256, feedforward_channels=1024, ffn_drop=0.0)), # fusion layer config fusion_layer_cfg=dict( v_dim=256, l_dim=256, embed_dim=1024, num_heads=4, init_values=1e-4), ), positional_encoding=dict( num_feats=128, normalize=True, offset=0.0, temperature=20), )
I have made following modifications, while errors remian.