Why are you feeding the prorposal region again through the encoder?

pfjaeger commented 6 years ago

Hi,

i was wondering what the benefit is of feeding the proposed region of the input image again through the encoder to the 3rd layer (where x/y = 24/24). These exact features have been computed in the first pass and you could crop the respective region in the feature map out using the normalized proposal coordinates. You could go straight to the 2x2x2 max pooling layer.

Or am i getting something wrong? Thanks for your answer!

lfz commented 6 years ago

Theoretically, it is absolutely feasible, but consider that the feature will change from epoch to epoch (since the w of conv layers has been changed), so you have to extract feature again and again during tarining. It is time consuming if we use the method you mentioned.

I choose to store the coordinate during the first pass, and extract feature from a small patch around it (96x96x96). So that the computation during feature extraction is very low.

2018-03-20 21:34 GMT+08:00 Paul Jaeger notifications@github.com:

Hi,

i was wondering what the benefit is of feeding the proposed region of the input image again through the encoder to the 3rd layer (where x/y = 24/24). These exact features have been computed in the first pass and you could crop the respective region in the feature map out using the normalized proposal coordinates. You could go straight to the 2x2x2 max pooling layer.

Or am i getting something wrong? Thanks for your answer!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lfz/DSB2017/issues/73, or mute the thread https://github.com/notifications/unsubscribe-auth/AIigQzYtwE7fwLTPzS3_aqenAZrr4MyQks5tgQVpgaJpZM4Sx5tR .

-- 廖方舟清华大学医学院 Liao Fangzhou School of Medicine Tsinghua University Beijing 100084 China

pfjaeger commented 6 years ago

right so this is due to the alternating training procedure you use. What if you would train the two losses simultaneously? You could take features and proposal coordinates from one single pass. Similar to the "approximate-joint training" described in the Faster RCNN paper. Does this make results worse in your case?

lfz / DSB2017

Why are you feeding the prorposal region again through the encoder? #73