Input dimensions during inference

sniklaus / 3d-ken-burns

an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

Other

1.52k stars 225 forks source link

Input dimensions during inference #45

Closed dfrumkin closed 4 years ago

dfrumkin commented 4 years ago

Hello Simon!

I have a question about image resizing during inference. You write:

Different from existing work, we do not resize the input image to a fixed resolution when providing it to the network and instead resize it such that its largest dimension is 512 pixels while preserving its aspect ratio.

Why is the larger dimension = 512 and not the smaller one? For example, if I crop the center during training, I would be looking at the shorter side, so it would seem consistent to do the same during inference.

sniklaus commented 4 years ago

Prior work to ours: resize the input to 512 pixels width and height. Our proposed approach: resize the input such that the larger dimension is 512 pixels while remaining the aspect ratio.

Our own training dataset has samples of 512 pixels width and height. One can randomly either remove a few pixels from the top and bottom or left and right to augment the aspect ratio of this training data. You suggest to have the smaller dimension be 512 pixels, this would require upsampling after this crop from opposing boundaries and would hence resample the image which is not desired.

dfrumkin commented 4 years ago

Thank you, Simon! I was thinking about changing inference only. For some reason I was thinking about the typical smallest object that your network can detect. In fact, it is better to focus on the big picture and restore details with a separate refinement network as you do.

sniklaus commented 4 years ago

I would expect that the network does worse if this is changed at test time, since it has been trained on this type of input.