In the paper it says "our stage 5 and 6 keeps the same input channels as stage 4(256 input channels for bottleneck block)". However in your code the way you create those two stages is take an input of c=1024.
I'm actually stuck by this for quite a while. We know that the # of output channels for stage 4 is 1024, then how is it possible that stage 5 takes an input channels of 256?
In the paper it says "our stage 5 and 6 keeps the same input channels as stage 4(256 input channels for bottleneck block)". However in your code the way you create those two stages is take an input of c=1024. I'm actually stuck by this for quite a while. We know that the # of output channels for stage 4 is 1024, then how is it possible that stage 5 takes an input channels of 256?