Applicability to fully-convolutional networks

Thank you for this interesting work! I noticed that with the ImageNet task you only apply gating to the fully-connected part of the network and leave the convolutional base unaltered. But most semantic segmentation models are fully-convolutional. Between the parameter sharing and their locally connected nature, I could see applying gating to convolutional layers being a bit tricky, so I was wondering if you've looked at this at all?

Oh, and you seem to go with 80% for the gating parameter - is this a finicky hyper-parameter or are results robust to this?

NOTE: I have class during the presentation and cannot attend :(

uchicago-computation-workshop / nicolas_masse

Applicability to fully-convolutional networks #31