mmaaz60 / EdgeNeXt

[CADL'22, ECCVW] Official repository of paper titled "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications".
MIT License
351 stars 40 forks source link

About ablation #19

Closed wsy-yjys closed 1 year ago

wsy-yjys commented 1 year ago

Table 7. Ablation on different components of EdgeNeXt and SDTA encoder. The results show the benefits of SDTA encoders and adaptive kernels in our design. Further, adaptive branching and positional encoding (PE) are also required in SDTA module.

image

In the table 7. I'm curious why the delay is the same with the base model when there is no adaptive branching? Looking forward to your reply~

mmaaz60 commented 1 year ago

Hi @wsy-yjys,

Thank You for your interest in our work. The inference time you are referring to is measured on a local server with NVIDIA A100 GPU & Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz using PyTorch. In this setting the major bottleneck is the channel attention in SDTA module. Also, in w/o Adaptive Branching we replace the adaptive branching component with standard depth wise convolution of corresponding size. All this might be the reason of nearly equal inference time in both cases. Please let me know if you have any questions or have any additional comments/insights.

Best Regards, Muhammad Maaz

wsy-yjys commented 1 year ago

tkank you for your reply!I got it.