Custom dimension input to model

nianticlabs / monodepth2

[ICCV 2019] Monocular depth estimation from a single image

Other

4.14k stars 958 forks source link

Custom dimension input to model #15

Closed otrivedi closed 5 years ago

otrivedi commented 5 years ago

Hi,

When I try to add a custom dimension image instead of a 1024x320 image, I get the following error:

line 60, in forward x = torch.cat(x, 1) RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 351 and 352 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:83

My understanding is it needs to have 2^5 as a factor due to the scales. Is there a workaround to bypass this mismatch?

mdfirman commented 5 years ago

Hi, thanks for getting in touch with this.

Can you please be a bit more specific about what you are trying here.

Are you training with the KITTI dataset, or with a different dataset?
Are you changing the input dimensions with --height xx --width yy flags, or via a different means?
And if you are setting them via these flags, what values are you using?

Thanks!

otrivedi commented 5 years ago

Thanks for the response.

I'm only running the inference in the simple example code provided on my 1280x720p images, and I'm overriding the "feeder_height" and width in the following line:

input_image = input_image.resize((feed_width, feed_height), Image.LANCZOS)

If I set the values to the native resolution (w=1280, h=720) I'll get the following features generated:

torch.Size([1, 64, 360, 640])
torch.Size([1, 64, 180, 320])
torch.Size([1, 128, 90, 160])
torch.Size([1, 256, 45, 80])
torch.Size([1, 512, 23, 40])

The error occurs when the odd-dimension tensor tries to concatenate during upscaling. The model I'm working with is mono+stereo_1024x320.

mdfirman commented 5 years ago

Ok, thanks for the extra info. I seem to remember that the network only runs if the dimensions are a multiple of 32. Could you maybe try 1280x704? Thanks.

mdfirman commented 5 years ago

(Ah, yes I see you have identified this in your original issue – it needs to have 2^5 as a factor. Yes. I don't think there is any easy way to avoid this. I think that you need to use multiples of 32, or elso hack the networks to pad/crop dimensions where needed)