As paper said,group conv of different group channel can extract different part features of an object.I wonder the improvement of performance come from group conv or attention mudule. Do you run an experiment of setting group num == 1 in Fig5? if you can compare vanilla resnet50 perf and resnet50 with group conv perf,it may be more convinced.
As paper said,group conv of different group channel can extract different part features of an object.I wonder the improvement of performance come from group conv or attention mudule. Do you run an experiment of setting group num == 1 in Fig5? if you can compare vanilla resnet50 perf and resnet50 with group conv perf,it may be more convinced.