kevinzakka / recurrent-visual-attention

A PyTorch Implementation of "Recurrent Models of Visual Attention"
MIT License
468 stars 123 forks source link

RuntimeError: size mismatch, m1: [32 x 192], m2: [64 x 128] at /pytorch/aten/src/TH/generic/THTensorMath.c:2033 #19

Closed duygusar closed 4 years ago

duygusar commented 6 years ago

Hello, I am trying to use this with my custom dataset. I am using a dataloader (see here https://github.com/kevinzakka/recurrent-visual-attention/issues/18) though even when I cast my image input to Float32 and get rid of that error, I get a mismatch of tensors while training the network.

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    main(config)
  File "main.py", line 40, in main
    trainer.train()
  File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 168, in train
    train_loss, train_acc = self.train_one_epoch(epoch)
  File "/home/duygu/recurrent-visual-attention-master/trainer.py", line 252, in train_one_epoch
    h_t, l_t, b_t, p = self.model(x, l_t, h_t)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/duygu/recurrent-visual-attention-master/model.py", line 101, in forward
    g_t = self.sensor(x, l_t_prev)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/duygu/recurrent-visual-attention-master/modules.py", line 214, in forward
    phi_out = F.relu(self.fc1(phi))
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [32 x 192], m2: [64 x 128] at /pytorch/aten/src/TH/generic/THTensorMath.c:2033

I can not figure out what goes wrong. Is it about patches or weights? Any insights could be really helpful. Thanks.

duygusar commented 6 years ago

So it turns out this might be a problem of pytorch, people did report problems with some image sizes (mine is 240x427 RGB). And an easy fix suggested (there seems to be a consensus) is to use adavptive pooling layer instead of average pooling. For example

resnet = torchvision.models.resnet18()
resnet.avgpool = nn.AdaptiveAvgPool2d(1)

is enough to overcome this problem. So I I did modify the nn parts that are imported and modified from torch models (in modules.py and model.py) but it didn't work.

linzhiqiu commented 6 years ago

So at this point it doesn't support 3 color channel and variable width/height right? Have you figured out how to do this?

duygusar commented 6 years ago

@linzhiqiu I am not sure about the exact reason, the author "implied" it might be due to using rectangular size. I don't want to take crops though as that beats the purpose of using attention model in my dataset. The mismatch error seems more like a batch/channel problem to me. I haven't figured it out yet, let me know if you plan to use this repo with your custom dataset and how it goes.

linzhiqiu commented 6 years ago

Sure. In the meantime, are you aware of any repo that implemented Deep RAM (DRAM) instead of RAM? I think in the paper "Multiple Object Recognition with Visual Attention", they extended RAM to make it more suitable for natural image classification, for example the Google Street View multi-digit classification dataset. I think that might be a newer/more powerful model than RAM, but I don't think anyone has open-source their code yet. Anyway, I just thought that maybe DRAM can better work on our custom datasets, as mentioned in another issue that the performance is not good with dataset other than Cluttered MNist.

dearleiii commented 6 years ago

Hey did you figure out the reason why the error occur? I ran into the same problem when resume my own training CNN for testing dataset, the training and testing datasets are of exactly the same dimension. So I'm confused why it report this error...

 File "load_model_test.py", line 72, in <module>
    outputs = model1(inputs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/project/xtmp/superresoluter/approximator/model1/apxm.py", line 60, in forward
    output = self.regressor(x)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/home2/leichen/.local/lib/python3.5/site-packages/torch/nn/functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [7 x 1036320], m2: [1048576 x 256] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

m2 is the size of my CNN last FC layer size, I totally have no idea where does the size m1 come from..

kevinzakka commented 6 years ago

@dearleiii @duygusar Hey guys, I have some free time in the coming week so I'll try and investigate this bug.

zccoder commented 5 years ago

@dearleiii @duygusar Hey guys, I have some free time in the coming week so I'll try and investigate this bug.

Has you solved this problems?Can you give me some advice on how to fix it.Thx

limaagabriel commented 5 years ago

Hi! I got the same problem, but found a solution. At line 63 of 'trainer.py', there is the definition of how many channels the input image should have. The problem is that this value is hardcoded and training with colored images give us this problem.

Hope I could help!

annanurov commented 5 years ago

Hello guys.

I tried using pytorch to construct a cnn from scratch. I have a similar problem: size mismatch, m1: [336 x 112], m2: [336 x 112]

Am I reshaping a wrong matrix?

limaagabriel commented 5 years ago

Hi, @annanurov! Maybe you are doing matrix multiplication, and for this operation, your matrices should be formatted as m1: [A x B], m2: [B x C]. Note that the number of rows from one matrix should be the same as the columns of another.

vidhit commented 5 years ago

Hi! I got the same problem, but found a solution. At line 63 of 'trainer.py', there is the definition of how many channels the input image should have. The problem is that this value is hardcoded and training with colored images give us this problem.

Hi,@itsmealves. I have this mismatch error too .Could you please give a detailed solution to this problem?.Input channels are mostly fixed so what does that have to do with this error?

tharangni commented 5 years ago

Hi, @annanurov! Maybe you are doing matrix multiplication, and for this operation, your matrices should be formatted as m1: [A x B], m2: [B x C]. Note that the number of rows from one matrix should be the same as the columns of another.

The error still persists for me even though I have dimensions such as - m1: [A x B], m2: [B x C]

ifgovh commented 5 years ago

when you give parameters, do --loc_hidden=192 and problem is solved. The reason is the code does not support multiple channels.

zli2014 commented 5 years ago

when you give parameters, do --loc_hidden=192 and problem is solved. The reason is the code does not support multiple channels.

thanks . i meet this problem, then modified full connect layer`s parameters

RamyaRaghuraman commented 4 years ago

@dearleiii Were you able to solve your error?

tehreemnaqvi commented 4 years ago

Hi, I am facing the same issue as this : RuntimeError: size mismatch, m1: [512 x 32], m2: [1024 x 10] how to fix this?

vickykhan89 commented 3 years ago

Hello all, I am receiving this kind of error, while passing tensor to FC network RuntimeError: size mismatch, m1: [200 x 10], m2: [11 x 64] at ..\aten\src\TH/generic/THTensorMath.cpp:961. Need suggestions and help thank you in advance