open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.02k stars 9.36k forks source link

How to increase the number of FPN layers in Grounding-DINO #11137

Open CDchenlin opened 10 months ago

CDchenlin commented 10 months ago

I have made following modifications, while errors remian.

model = dict(
    num_feature_levels=5, # added
    language_model=dict(name=lang_model_name),
    backbone=dict(
        out_indices=(0, 1, 2, 3), # modified from  out_indices=( 1, 2, 3)
        with_cp=False),
    neck=dict(
        in_channels=[96, 192, 384, 768], # modified from in_channels=[192, 384, 768]
        num_outs=5), # modified from num_outs=4
    bbox_head=dict(num_classes=13),
    encoder=dict(
        num_layers=6,
        num_cp=6,
        # visual layer config
        layer_cfg=dict(
            self_attn_cfg=dict(embed_dims=256, num_levels=5, dropout=0.0), # modified from num_levels=4
            ffn_cfg=dict(
                embed_dims=256, feedforward_channels=2048, ffn_drop=0.0)),
        # text layer config
        text_layer_cfg=dict(
            self_attn_cfg=dict(num_heads=4, embed_dims=256, dropout=0.0),
            ffn_cfg=dict(
                embed_dims=256, feedforward_channels=1024, ffn_drop=0.0)),
        # fusion layer config
        fusion_layer_cfg=dict(
            v_dim=256,
            l_dim=256,
            embed_dim=1024,
            num_heads=4,
            init_values=1e-4),
    ),
    positional_encoding=dict(
        num_feats=128, normalize=True, offset=0.0, temperature=20),
)