wzzheng / TPVFormer

[CVPR 2023] An academic alternative to Tesla's occupancy network for autonomous driving.
https://wzzheng.net/TPVFormer/
Apache License 2.0
1.19k stars 107 forks source link

Discrepancy in Parameter Count and FLOPs in Paper #61

Open npurson opened 1 year ago

npurson commented 1 year ago

In the paper, on page 8, in Section 4.5, the reported statistics are as follows:

Specifically, TPVFormer has only 6.0M parameters versus 15.7M for MonoScene, and 128G FLOPS per image versus 500G for MonoScene.

However, this information is evidently inaccurate due to the fact that the backbone of MonoScene, EfficientNet-B7, alone contains 66M parameters, as documented in the paper and as verified by our own measurements.

Furthermore, it's worth noting that the correct terminology should be "FLOPs" (Floating-Point Operations) rather than "FLOPS", which stands for "Floating-Point Operations per Second"

For reference, we have included our measurement for MonoScene as assessed by fvcore:

| module                                         | #parameters or shape   | #flops     |
|:-----------------------------------------------|:-----------------------|:-----------|
| model                                          | 0.149G                 | 0.99T      |
|  encoder                                       |  0.133G                |  0.461T    |
|   encoder.encoder.backbone                     |   63.787M              |   45.949G  |
|    encoder.encoder.backbone.conv_stem          |    1.728K              |    0.195G  |
|    encoder.encoder.backbone.bn1                |    0.128K              |    14.445M |
|    encoder.encoder.backbone.blocks             |    62.142M             |    45.053G |
|    encoder.encoder.backbone.conv_head          |    1.638M              |    0.685G  |
|    encoder.encoder.backbone.bn2                |    5.12K               |    2.14M   |
|   encoder.decoder                              |   68.801M              |   0.415T   |
|    encoder.decoder.conv                        |    6.556M              |    3.408G  |
|    encoder.decoder.resizes                     |    77.056K             |    4.328G  |
|    encoder.decoder.upsamples                   |    62.167M             |    0.407T  |
|  decoder                                       |  16.861M               |  0.53T     |
|   decoder.process_l1                           |   28.896K              |   4.474G   |
|    decoder.process_l1.0.blks                   |    13.824K             |    3.624G  |
|    decoder.process_l1.1.btnk                   |    15.072K             |    0.85G   |
|   decoder.process_l2                           |   0.113M               |   2.178G   |
|    decoder.process_l2.0.blks                   |    53.76K              |    1.762G  |
|    decoder.process_l2.1.btnk                   |    58.816K             |    0.416G  |
|   decoder.CP_mega_voxels                       |   15.346M              |   54.459G  |
|    decoder.CP_mega_voxels.mega_context         |    3.539M              |    1.812G  |
|    decoder.CP_mega_voxels.context_prior_logits |    0.526M              |    2.147G  |
|    decoder.CP_mega_voxels.aspp.blks            |    10.62M              |    43.499G |
|    decoder.CP_mega_voxels.resize               |    0.66M               |    2.705G  |
|   decoder.up_13_l2.up_bn                       |   0.885M               |   3.632G   |
|    decoder.up_13_l2.up_bn.0                    |    0.885M              |    3.624G  |
|    decoder.up_13_l2.up_bn.1                    |    0.256K              |    8.389M  |
|   decoder.up_12_l1.up_bn                       |   0.221M               |   7.281G   |
|    decoder.up_12_l1.up_bn.0                    |    0.221M              |    7.248G  |
|    decoder.up_12_l1.up_bn.1                    |    0.128K              |    33.554M |
|   decoder.up_l1_full.up_bn                     |   55.392K              |   14.63G   |
|    decoder.up_l1_full.up_bn.0                  |    55.328K             |    14.496G |
|    decoder.up_l1_full.up_bn.1                  |    64                  |    0.134G  |
|   decoder.ssc_head                             |   0.211M               |   0.443T   |
|    decoder.ssc_head.conv0                      |    27.68K              |    57.982G |
|    decoder.ssc_head.aspp.blks                  |    0.166M              |    0.349T  |
|    decoder.ssc_head.conv_cls                   |    17.3K               |    36.239G |