Open whuxfx opened 1 week ago
好的,感谢对本仓库的关注,这两天我会把这些模型代码开源出来,如果权重还能找到的话,我也会上传
Hi @whuxfx 这几个模型的代码已更新,Deformable的权重时间太久远已经找不到了,所以只上传了DAB和DN的权重。
❯ CUDA_VISIBLE_DEVICES=2 python tools/benchmark_model.py --model-config configs/deformable_detr_mp/def_detr_pp_resnet_800_1333.py
Using /home//.cache/torch_extensions/py38_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home//.cache/torch_extensions/py38_cu121/MultiScaleDeformableAttention/build.ninja... Building extension module MultiScaleDeformableAttention... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module MultiScaleDeformableAttention...
module | #parameters or shape | #flops |
---|---|---|
model | 47.485M | 0.281T |
backbone | 23.455M | 87.581G |
backbone.conv1 | 9.408K | 2.529G |
backbone.conv1.weight | (64, 3, 7, 7) | |
backbone.layer1 | 0.213M | 14.313G |
backbone.layer1.0 | 73.728K | 4.955G |
backbone.layer1.1 | 69.632K | 4.679G |
backbone.layer1.2 | 69.632K | 4.679G |
backbone.layer2 | 1.212M | 22.02G |
backbone.layer2.0 | 0.377M | 7.982G |
backbone.layer2.1 | 0.279M | 4.679G |
backbone.layer2.2 | 0.279M | 4.679G |
backbone.layer2.3 | 0.279M | 4.679G |
backbone.layer3 | 7.078M | 31.379G |
backbone.layer3.0 | 1.507M | 7.982G |
backbone.layer3.1 | 1.114M | 4.679G |
backbone.layer3.2 | 1.114M | 4.679G |
backbone.layer3.3 | 1.114M | 4.679G |
backbone.layer3.4 | 1.114M | 4.679G |
backbone.layer3.5 | 1.114M | 4.679G |
backbone.layer4 | 14.942M | 17.341G |
backbone.layer4.0 | 6.029M | 7.982G |
backbone.layer4.1 | 4.456M | 4.679G |
backbone.layer4.2 | 4.456M | 4.679G |
neck.convs | 5.638M | 5.17G |
neck.convs.0 | 0.132M | 2.224G |
neck.convs.0.0 | 0.131M | 2.202G |
neck.convs.0.1 | 0.512K | 21.504M |
neck.convs.1 | 0.263M | 1.106G |
neck.convs.1.0 | 0.262M | 1.101G |
neck.convs.1.1 | 0.512K | 5.376M |
neck.convs.2 | 0.525M | 0.552G |
neck.convs.2.0 | 0.524M | 0.551G |
neck.convs.2.1 | 0.512K | 1.344M |
neck.convs.3 | 4.719M | 1.289G |
neck.convs.3.0 | 4.719M | 1.288G |
neck.convs.3.1 | 0.512K | 0.349M |
transformer | 18.392M | 0.188T |
transformer.level_embeds | (4, 256) | |
transformer.enc_output | 65.792K | 1.463G |
transformer.enc_output.weight | (256, 256) | |
transformer.enc_output.bias | (256,) | |
transformer.enc_output_norm | 0.512K | 28.573M |
transformer.enc_output_norm.weight | (256,) | |
transformer.enc_output_norm.bias | (256,) | |
transformer.encoder.layers | 7.693M | 0.172T |
transformer.encoder.layers.0 | 1.282M | 28.585G |
transformer.encoder.layers.1 | 1.282M | 28.585G |
transformer.encoder.layers.2 | 1.282M | 28.585G |
transformer.encoder.layers.3 | 1.282M | 28.585G |
transformer.encoder.layers.4 | 1.282M | 28.585G |
transformer.encoder.layers.5 | 1.282M | 28.585G |
transformer.decoder | 10.343M | 11.758G |
transformer.decoder.layers | 9.275M | 11.439G |
transformer.decoder.ref_point_head | 0.132M | 39.706M |
transformer.decoder.class_head | 0.14M | 41.933M |
transformer.decoder.bbox_head | 0.796M | 0.238G |
transformer.decoder.position_relation_embedding.pos_proj.0 | 0.52K | |
transformer.encoder_class_head | 23.387K | 0.52G |
transformer.encoder_class_head.weight | (91, 256) | |
transformer.encoder_class_head.bias | (91,) | |
transformer.encoder_bbox_head.layers | 0.133M | 2.949G |
transformer.encoder_bbox_head.layers.0 | 65.792K | 1.463G |
transformer.encoder_bbox_head.layers.1 | 65.792K | 1.463G |
transformer.encoder_bbox_head.layers.2 | 1.028K | 22.859M |
transformer.pos_trans | 0.131M | 39.322M |
transformer.pos_trans.weight | (256, 512) | |
transformer.pos_trans.bias | (256,) | |
transformer.pos_trans_norm | 0.512K | 0.384M |
transformer.pos_trans_norm.weight | (256,) | |
transformer.pos_trans_norm.bias | (256,) |
Memory allocation 0.20057344436645508 GB Max memory allocation 3.286261558532715 GB Model parameters 0.04422363732010126 GB warm up... testing inference time... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:02<00:00, 24.93it/s] avg inference time per image = 0.04077534570499342
作者你好,我使用了代码自带的工具进行测试,为什么def-detr的参数量和浮点计算数相差这么大。带位置编码和不带位置编码 | module | #parameters or shape | #flops |
---|---|---|---|
model | 47.484M | 0.281T |
参数量和浮点数相差不大,但是和原始代码的def-detr数据相差有点大
应该是def_detr_pp_resnet50_800_1333配置的部分参数和和原始实现不一致,比如这里dim_feedforward=2048,官方用的是1024。抽空我把配置文件和官方对齐一下
还有一个原因,FLOPs和输入图片的尺寸有关,只有在相同尺寸的输入图片下才有对比性。我这里默认的FLOPs计算时输入的图片尺寸是800*1333,您可以看一下官方输入的尺寸是多少。
我刚才测试了一下,如果把dim_feedforward改成1024,并且尺寸都设置成800*1333,那么本仓库中的测试结果和detrex中的deformable-detr结果是一致的。(下面分别是本仓库和detrex结果,detrex输入图片的尺寸被固定成了800*1333)
(cp311pt211) houxiuquan@amax:/data2/houxiuquan/detection$ python tools/benchmark_model.py --model-config configs/models/deformable_detr/def_detr_resnet50_1024.py
/data2/houxiuquan/envs/cp311pt211/lib/python3.11/site-packages/torch/overrides.py:110: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
torch.has_cuda,
/data2/houxiuquan/envs/cp311pt211/lib/python3.11/site-packages/torch/overrides.py:111: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
torch.has_cudnn,
/data2/houxiuquan/envs/cp311pt211/lib/python3.11/site-packages/torch/overrides.py:117: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
torch.has_mps,
/data2/houxiuquan/envs/cp311pt211/lib/python3.11/site-packages/torch/overrides.py:118: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
torch.has_mkldnn,
| module | #parameters or shape | #flops |
|:------------------------------------------|:-----------------------|:-----------|
| model | 41.181M | 0.21T |
| backbone | 23.455M | 87.581G |
| backbone.conv1 | 9.408K | 2.529G |
| backbone.conv1.weight | (64, 3, 7, 7) | |
| backbone.layer1 | 0.213M | 14.313G |
| backbone.layer1.0 | 73.728K | 4.955G |
| backbone.layer1.1 | 69.632K | 4.679G |
| backbone.layer1.2 | 69.632K | 4.679G |
| backbone.layer2 | 1.212M | 22.02G |
| backbone.layer2.0 | 0.377M | 7.982G |
| backbone.layer2.1 | 0.279M | 4.679G |
| backbone.layer2.2 | 0.279M | 4.679G |
| backbone.layer2.3 | 0.279M | 4.679G |
| backbone.layer3 | 7.078M | 31.379G |
| backbone.layer3.0 | 1.507M | 7.982G |
| backbone.layer3.1 | 1.114M | 4.679G |
| backbone.layer3.2 | 1.114M | 4.679G |
| backbone.layer3.3 | 1.114M | 4.679G |
| backbone.layer3.4 | 1.114M | 4.679G |
| backbone.layer3.5 | 1.114M | 4.679G |
| backbone.layer4 | 14.942M | 17.341G |
| backbone.layer4.0 | 6.029M | 7.982G |
| backbone.layer4.1 | 4.456M | 4.679G |
| backbone.layer4.2 | 4.456M | 4.679G |
| neck.convs | 5.638M | 5.17G |
| neck.convs.0 | 0.132M | 2.224G |
| neck.convs.0.0 | 0.131M | 2.202G |
| neck.convs.0.1 | 0.512K | 21.504M |
| neck.convs.1 | 0.263M | 1.106G |
| neck.convs.1.0 | 0.262M | 1.101G |
| neck.convs.1.1 | 0.512K | 5.376M |
| neck.convs.2 | 0.525M | 0.552G |
| neck.convs.2.0 | 0.524M | 0.551G |
| neck.convs.2.1 | 0.512K | 1.344M |
| neck.convs.3 | 4.719M | 1.289G |
| neck.convs.3.0 | 4.719M | 1.288G |
| neck.convs.3.1 | 0.512K | 0.349M |
| transformer | 12.087M | 0.117T |
| transformer.level_embeds | (4, 256) | |
| transformer.enc_output | 65.792K | 1.463G |
| transformer.enc_output.weight | (256, 256) | |
| transformer.enc_output.bias | (256,) | |
| transformer.enc_output_norm | 0.512K | 28.573M |
| transformer.enc_output_norm.weight | (256,) | |
| transformer.enc_output_norm.bias | (256,) | |
| transformer.encoder.layers | 4.541M | 0.101T |
| transformer.encoder.layers.0 | 0.757M | 16.881G |
| transformer.encoder.layers.1 | 0.757M | 16.881G |
| transformer.encoder.layers.2 | 0.757M | 16.881G |
| transformer.encoder.layers.3 | 0.757M | 16.881G |
| transformer.encoder.layers.4 | 0.757M | 16.881G |
| transformer.encoder.layers.5 | 0.757M | 16.881G |
| transformer.decoder | 7.191M | 10.815G |
| transformer.decoder.layers | 6.123M | 10.495G |
| transformer.decoder.ref_point_head | 0.132M | 39.706M |
| transformer.decoder.class_head | 0.14M | 41.933M |
| transformer.decoder.bbox_head | 0.796M | 0.238G |
| transformer.encoder_class_head | 23.387K | 0.52G |
| transformer.encoder_class_head.weight | (91, 256) | |
| transformer.encoder_class_head.bias | (91,) | |
| transformer.encoder_bbox_head.layers | 0.133M | 2.949G |
| transformer.encoder_bbox_head.layers.0 | 65.792K | 1.463G |
| transformer.encoder_bbox_head.layers.1 | 65.792K | 1.463G |
| transformer.encoder_bbox_head.layers.2 | 1.028K | 22.859M |
| transformer.pos_trans | 0.131M | 39.322M |
| transformer.pos_trans.weight | (256, 512) | |
| transformer.pos_trans.bias | (256,) | |
| transformer.pos_trans_norm | 0.512K | 0.384M |
| transformer.pos_trans_norm.weight | (256,) | |
| transformer.pos_trans_norm.bias | (256,) | |
Memory allocation 0.1750655174255371 GB
Max memory allocation 2.697573661804199 GB
Model parameters 0.038352333940565586 GB
warm up...
testing inference time...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:04<00:00, 10.71it/s]
avg inference time per image = 0.09424854909157267
(cp311pt211) houxiuquan@amax:/data2/houxiuquan/detection$
WARNING [11/21 18:03:56 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
stem.fc.{bias, weight}
0%| | 0/2 [00:00<?, ?it/s]/data2/houxiuquan/envs/detrex/lib/python3.8/site-packages/torch/nn/functional.py:2498: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
_verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))
/data2/houxiuquan/envs/detrex/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::cumsum encountered 9 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::pow encountered 5 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::sin encountered 9 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::cos encountered 9 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::prod encountered 1 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::sum encountered 16 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::linspace encountered 16 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator prim::PythonOp.MultiScaleDeformableAttnFunction encountered 12 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::ones_like encountered 4 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::lt encountered 1 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::all encountered 1 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::log encountered 13 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::topk encountered 2 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: Unsupported operator aten::repeat encountered 2 time(s)
WARNING [11/21 18:04:00 fvcore.nn.jit_analysis]: The following submodules of the model were never called during the trace of the graph. They may be unused, or they were accessed by direct calls to .forward() or via other python methods. In the latter case they will have zeros for statistics, though their statistics will still contribute to their parent calling module.
model.criterion, model.criterion.matcher, model.transformer.decoder.layers.0.attentions.0.attn.out_proj, model.transformer.decoder.layers.1.attentions.0.attn.out_proj, model.transformer.decoder.layers.2.attentions.0.attn.out_proj, model.transformer.decoder.layers.3.attentions.0.attn.out_proj, model.transformer.decoder.layers.4.attentions.0.attn.out_proj, model.transformer.decoder.layers.5.attentions.0.attn.out_proj
50%|████████████████████████████████████████████████ | 1/2 [00:03<00:03, 3.99s/it]/data2/houxiuquan/envs/detrex/lib/python3.8/site-packages/torch/nn/functional.py:2498: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
_verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))
100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.68s/it]
[11/21 18:04:02 detectron2]: Flops table computed from only one input sample:
| module | #parameters or shape | #flops |
|:--------------------------------------|:-----------------------|:-----------|
| model | 41.162M | 0.21T |
| backbone | 23.455M | 87.807G |
| backbone.stem.conv1 | 9.408K | 2.544G |
| backbone.stem.conv1.weight | (64, 3, 7, 7) | |
| backbone.stem.conv1.norm | | 34.15M |
| backbone.res2 | 0.213M | 14.416G |
| backbone.res2.0 | 73.728K | 5.011G |
| backbone.res2.1 | 69.632K | 4.703G |
| backbone.res2.2 | 69.632K | 4.703G |
| backbone.res3 | 1.212M | 22.022G |
| backbone.res3.0 | 0.377M | 7.99G |
| backbone.res3.1 | 0.279M | 4.677G |
| backbone.res3.2 | 0.279M | 4.677G |
| backbone.res3.3 | 0.279M | 4.677G |
| backbone.res4 | 7.078M | 31.458G |
| backbone.res4.0 | 1.507M | 7.997G |
| backbone.res4.1 | 1.114M | 4.692G |
| backbone.res4.2 | 1.114M | 4.692G |
| backbone.res4.3 | 1.114M | 4.692G |
| backbone.res4.4 | 1.114M | 4.692G |
| backbone.res4.5 | 1.114M | 4.692G |
| backbone.res5 | 14.942M | 17.368G |
| backbone.res5.0 | 6.029M | 7.996G |
| backbone.res5.1 | 4.456M | 4.686G |
| backbone.res5.2 | 4.456M | 4.686G |
| neck | 5.639M | 5.157G |
| neck.convs | 0.92M | 3.869G |
| neck.convs.0 | 0.132M | 2.21G |
| neck.convs.1 | 0.263M | 1.106G |
| neck.convs.2 | 0.525M | 0.552G |
| neck.extra_convs.0 | 4.719M | 1.289G |
| neck.extra_convs.0.conv | 4.719M | 1.288G |
| neck.extra_convs.0.norm | 0.512K | 0.349M |
| transformer | 12.068M | 0.117T |
| transformer.level_embeds | (4, 256) | |
| transformer.encoder.layers | 4.541M | 0.101T |
| transformer.encoder.layers.0 | 0.757M | 16.806G |
| transformer.encoder.layers.1 | 0.757M | 16.806G |
| transformer.encoder.layers.2 | 0.757M | 16.806G |
| transformer.encoder.layers.3 | 0.757M | 16.806G |
| transformer.encoder.layers.4 | 0.757M | 16.806G |
| transformer.encoder.layers.5 | 0.757M | 16.806G |
| transformer.decoder | 7.195M | 14.635G |
| transformer.decoder.layers | 6.123M | 10.732G |
| transformer.decoder.bbox_embed | 0.928M | 3.411G |
| transformer.decoder.class_embed | 0.144M | 0.492G |
| transformer.enc_output | 65.792K | 1.456G |
| transformer.enc_output.weight | (256, 256) | |
| transformer.enc_output.bias | (256,) | |
| transformer.enc_output_norm | 0.512K | 28.445M |
| transformer.enc_output_norm.weight | (256,) | |
| transformer.enc_output_norm.bias | (256,) | |
| transformer.pos_trans | 0.263M | 78.643M |
| transformer.pos_trans.weight | (512, 512) | |
| transformer.pos_trans.bias | (512,) | |
| transformer.pos_trans_norm | 1.024K | 0.768M |
| transformer.pos_trans_norm.weight | (512,) | |
| transformer.pos_trans_norm.bias | (512,) | |
[11/21 18:04:02 detectron2]: Average GFlops for each type of operators:
[('conv', 92.461884416), ('batch_norm', 0.4740864), ('group_norm', 0.02844544), ('upsample_nearest2d', 2.2223e-05), ('linear', 116.379134976), ('layer_norm', 0.37747072), ('bmm', 0.27648)]
[11/21 18:04:02 detectron2]: Total GFlops: 210.0±0.0
(detrex) houxiuquan@amax:/data1/houxiuquan/detrex$
好的,我将dim_feedforward减小后得到和你类似的结果
Question
作者你好,关于论文中“5.4 Transferability of position relation”中Relation编码实现了对三种detr变体的性能提升,请问可以开源这三个detr变体的实现代码吗。谢谢!
补充信息
No response