Discrepancy in GFLOPs calculation compared to paper results

yunxiangfu2001 commented 3 hours ago

Hi, thank you for your work. I've been using the VWformer segmentation head and noticed that the GFLOPs I'm measuring do not match the numbers reported in the paper. Specifically using MiT_b2 backbone and VW_head, the reported GFLOPs is 46.6 and my measured GFLOPs are 482.

My model config (The mit_b2 is from https://github.com/NVlabs/SegFormer/blob/master/mmseg/models/backbones/mix_transformer.py):

model =  dict(
    type='EncoderDecoder',
    backbone=dict(
    type='mit_b2',
    style='pytorch',
    ),
decode_head=dict(
    type='VWHead',
    in_channels=[64, 128, 320, 512],
    in_index=[0, 1, 2, 3],
    channels=64,
    nheads=1,
    dropout_ratio=0.1,
    num_classes=150,
    short_cut=True,
    norm_cfg=norm_cfg,
    align_corners=False,
    loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
    train_cfg=dict(),
    test_cfg  =  dict(mode='whole'))

fvcore for flop computation:

import copy
import torch
from fvcore.nn import FlopCountAnalysis, flop_count_str, flop_count, parameter_count
if torch.cuda.is_available:
    model.cuda()
model.eval()
shape=(3,512,512)
input  = torch.randn((1, *shape), device=next(model.parameters()).device)
params = parameter_count(model)[""]
Gflops, unsupported = flop_count(model=model, inputs=(input,))

Could you please clarify:

The exact configuration used for GFLOPs calculation in the paper
The measurement methodology/tools used

This would help ensure fair comparison and proper benchmarking of the model. Thanks in advance!

By the way, the THOPs and MMCV methods for computing FLOPs may not account for the matrix multiplications.

yan-hao-tian commented 2 hours ago

https://github.com/yan-hao-tian/VW/blob/6e4b6b4bfe3b54bd92cda6abc5dadb6b366c9046/mmsegmentation-custom/mmseg/models/decode_heads/vw_head.py#L146 用的是这个吗

yan-hao-tian commented 2 hours ago

您测试结果的截图给我看一下？

yan-hao-tian commented 2 hours ago

您要不把head改成segformer再测一下看比vwformer差多少

yan-hao-tian commented 2 hours ago

我理解vwformer涉及的矩阵相乘就是三个尺度的attention map和weighted summation的计算，因为局限在局部窗口内，这个加起来比一次线性映射的计算量还小，也就是Cx64Hx64Wx8Px8Px3x2<CxCx64Hx64W，理论上是无所谓算不算的。您觉得还有其他的矩阵相乘我们可以交流一下。

yan-hao-tian commented 1 hour ago

还有就是backbone部分MiT的矩阵相乘，我理解也只包含attention map和weighted summation，因为它固定将context下采样到os=32的大小，如果输入是512也就是16，所有它的计算量也很小了，CxHxWx16x16<CxCxHxW，一次计算也远小于一次线性映射。

yunxiangfu2001 commented 1 hour ago

感谢您及时的回复，

这是我算出来的FLOP:

然后这是用Segformer MLP Head的flop

https://github.com/yan-hao-tian/VW/blob/6e4b6b4bfe3b54bd92cda6abc5dadb6b366c9046/mmsegmentation-custom/mmseg/models/decode_heads/vw_head.py#L146

用的是这个吗

是的

我理解vwformer涉及的矩阵相乘就是三个尺度的attention map和weighted summation的计算，因为局限在局部窗口内，这个加起来比一次线性映射的计算量还小，也就是Cx64Hx64Wx8Px8Px3x2<CxCx64Hx64W，理论上是无所谓算不算的。您觉得还有其他的矩阵相乘我们可以交流一下。

确实很奇怪，感觉flop不应该这么高。我算flop的脚本测试了很多backbone 和head 都是没问题的，还没细看VWHead.py 的代码，不确定哪里有问题。

yan-hao-tian commented 1 hour ago

应该是代码有问题造成的，我刚update了一下vw_head.py，您可以再测下试试 https://github.com/yan-hao-tian/VW/commit/9bc1891f60195fbd7f520739edc88f59e7a9e4bb

yan-hao-tian / VW

Discrepancy in GFLOPs calculation compared to paper results #23