Why did you exclude EfficientNetB0 from Accuracy-Latency chart?

AlexeyAB commented 4 years ago

@iamhankai Hi,

Great work!

Why did you exclude EfficientNetB0 (0.390 BFlops - 76.3% Top1) from Accuracy-Latency chart?
Also what mini_batch_size did you use for training GhostNet?

flops_latency

iamhankai commented 4 years ago

Actually, we have tested the latency of Efficient-B0. The latency is too large (~98ms) which cannot be put inside the current chart.

iamhankai commented 4 years ago

In addition, we have also tested the latency of MixNet (https://github.com/AlexeyAB/darknet/issues/4503) and the latency is also too large (>85ms). The various kernel size in the same depthwise conv layer is harmful for the inference speed.

iamhankai commented 4 years ago

we used mini_batch_size=1024 for training on 8 GPUs.

AlexeyAB commented 4 years ago

@iamhankai Thanks.

So GhostNet looks much more promising.

Did you compare latency (ms) of GhostNet vs MobileNetV3 vs MNasNet on GPU or TPU?
Also did you compare Accuracy/latency (ms) of these models with models PeleeNet, SNet, DenseNet or at least ResNet18?

AlexeyAB commented 4 years ago

As I understand, GhostBlock is just Conv2D + depthwise_conv2d + concat ?

iamhankai commented 4 years ago

As I understand, GhostBlock is just Conv2D + depthwise_conv2d + concat ?

https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/myconv2d.py#L48

https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/myconv2d.py#L37

https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/myconv2d.py#L54

Yes. With these efficient operators, GhostNet can be simple yet fast.

AlexeyAB commented 4 years ago

@iamhankai Thanks for your answers and SOTA network!

Why are you duplicating convolution stride = 2?
- one for shortcut: https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/ghost_net.py#L249-L250
- another one for the main branch: https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/ghost_net.py#L264-L265

Why do you specify dropout probability several times, but never use it?
- https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/ghost_net.py#L38
- https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/ghost_net.py#L298

Why don’t you perform relu after shortcut (residual-connection)? https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/ghost_net.py#L281

Why do you calculate out_channel but never use it? https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/myconv2d.py#L29

As I see, the main decrease in BFLOPS (-60 M Flops) is achieved by moving the Conv2D(1280 filters)-layer after slim.avg_pool2d-layer: https://github.com/iamhankai/ghostnet/blob/47ef752446ba761dc5342ce06cbc26537b038289/ghost_net.py#L218-L234 compared to:
- MixNet https://github.com/tensorflow/tpu/blob/ea5d379424e4121d29d12ff611ec6a0705e01e94/models/official/mnasnet/mixnet/mixnet_model.py#L364
- EfficientNet https://github.com/tensorflow/tpu/blob/ea5d379424e4121d29d12ff611ec6a0705e01e94/models/official/efficientnet/efficientnet_model.py#L235

iamhankai commented 4 years ago

using stride=2 in both the shortcut branch and the main branch can result in feature maps of the same size, so they can be added.
The dropout is set dropout_keep_prob=0.8 in our Ghost 1.0x.
Follow MobileNetV2.
The code isn't clean enough, sorry for that.
Follow MobileNetV3.

huawei-noah / Efficient-AI-Backbones

Why did you exclude EfficientNetB0 from Accuracy-Latency chart? #1