I'm trying to finetune Inception-ResNet-V2 for the task of dense per-pixel predictions. I've successfully used ResNet V1 from Slim in the past, however I cannot simply replace it with Inception-Resnet. In my code (that resembles DeepLab V2) I set output_stride=16 to enable atrous convolutions (seems to be default in Inception-ResNet already).
For an input image of size 336x336 the ResNet-101 model produces feature maps of 21x21 spatial dimensions, while Inception-Resnet-V2 creates 9x9 maps. I also had to manually remove global pooling (had to hack into the code). In the first case, one can verify that output stride is indeed 336/21=16. However 336/9=37.33(3) which is a fractional number and >32.
I see two issues here. First, output_stride=16 doesn't work properly and the model performs additional downsampling. Secondly, due to padding issues resolution is reduced even further. So my question is, would it be possible to get an exact stride of 16 with this network?
Cheers,
Eldar.
System information
What is the top-level directory of the model you are using: slim
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Debian 8.0 64bit
TensorFlow installed from (source or binary): binary
This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!
Hi,
I'm trying to finetune Inception-ResNet-V2 for the task of dense per-pixel predictions. I've successfully used ResNet V1 from Slim in the past, however I cannot simply replace it with Inception-Resnet. In my code (that resembles DeepLab V2) I set output_stride=16 to enable atrous convolutions (seems to be default in Inception-ResNet already).
For an input image of size 336x336 the ResNet-101 model produces feature maps of 21x21 spatial dimensions, while Inception-Resnet-V2 creates 9x9 maps. I also had to manually remove global pooling (had to hack into the code). In the first case, one can verify that output stride is indeed 336/21=16. However 336/9=37.33(3) which is a fractional number and >32.
I see two issues here. First, output_stride=16 doesn't work properly and the model performs additional downsampling. Secondly, due to padding issues resolution is reduced even further. So my question is, would it be possible to get an exact stride of 16 with this network?
Cheers, Eldar.
System information