hyseob / MDNet

Learning Multi-Domain Convolutional Neural Networks for Visual Tracking
Other
470 stars 177 forks source link

question about the input size #9

Open ShadowLau opened 7 years ago

ShadowLau commented 7 years ago

@HyeonseobNam Thanks for your generous to make your excellent work open. Recently, when i read your paper, i am confused with your network architecture. In your paper, you said that the input size is "107 = 75 (receptive) + 216 (stride)". Can you explain me how to get the "75" and "216"? Thank you very much again.

hyseob commented 7 years ago

@ShadowLau As written in the paper, we designed the input size to produce 3x3 feature maps at conv3. Our network converts a 75x75 input to 1x1 at conv3; the stride of conv3 w.r.t the input is 16 (=2x2x2x2x1), so a (75+16k)x(75+16k) input produces (1+k)x(1+k) at conv3.

ShadowLau commented 7 years ago

@HyeonseobNam Thank you. I can understand 107x107 to 3x3 step by step (layer by layer). I just can not understand why stride is 16. Maybe you mean "x2 pool" equals to "stride 2"?

hyseob commented 7 years ago

@ShadowLau Right :) Pooling sizes equal to pooling strides in our network.

ShadowLau commented 7 years ago

@HyeonseobNam Get it :) Thank you very much!