sniklaus / pytorch-pwc

a reimplementation of PWC-Net in PyTorch that matches the official Caffe version
GNU General Public License v3.0
620 stars 123 forks source link

argument 'second' in '_FunctionCorrelation' is not contiguous if batch size bigger than 1 #35

Closed pablopalafox closed 4 years ago

pablopalafox commented 4 years ago

Hi @sniklaus,

thanks a lot for your work! I managed to get the model overfit to one sample, using a batch size of 1. But when moving on to fine tuning the pretrained models and using a batch size different than 1, argument second in _FunctionCorrelation is not contiguous. I guess one option is to do second.contiguous(). But I was curious as to why this is happening. I've roughly spotted that function Backward returns a non-contiguous tensor (when batch size != 1). Let me know if this had happened to you before, or if maybe it's some issue from my side. Cheers!

EDIT: in particular, it's this line:

tensorMask = tensorOutput[:, -1:, :, :]

that is causing the output of Backward to not be contiguous when the batch size is not 1, which makes sense, since slicing can cause this (see this thread)

sniklaus commented 4 years ago

Are you triggering the following assert?

https://github.com/sniklaus/pytorch-pwc/blob/90c1ac66d5bc6ba1ce7f60064f128d519e946bea/correlation/correlation.py#L341

sniklaus commented 4 years ago

Closing due to inactivity, feel free to reopen in case this issue persists.

pablopalafox commented 4 years ago

Hi @sniklaus, this assert is being triggered.

I'm making the output of Backward contiguous as follows:

return (tensorOutput[:, :-1, :, :] * tensorMask).contiguous()

Seems to have solved my problem.

sniklaus commented 4 years ago

Thank you for sharing your findings, I would have suggested the same fix. :+1:

pablopalafox commented 4 years ago

Awesome, thanks a lot!

eric-yyjau commented 4 years ago

That's really helpful. Thank!