Should we implement grouped convolutions in Alexnet baseline and attention model, or only use straightforward conv layers? If using groups, how many groups should we use?
Also, should we consider using separable filters or any such thing? Or only implement basic Alexnet architecture (+ attention)?
Should we implement grouped convolutions in Alexnet baseline and attention model, or only use straightforward conv layers? If using groups, how many groups should we use? Also, should we consider using separable filters or any such thing? Or only implement basic Alexnet architecture (+ attention)?