torch / torch7

http://torch.ch
Other
9k stars 2.38k forks source link

CUDA error: Invalid arguments: CudaTensor number, expected arguments: *CudaTensor~2D* #962

Closed tastyminerals closed 7 years ago

tastyminerals commented 7 years ago

I have two models A and B, where B model is the extension of A, having almost the same architecture except for additional nn.Linear, nn.CAddTable layers before nn.LogSoftMax, see models below:

Language Model A:   
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> output]
  (1): LookupTable
  (2): SplitTable
  (3): nn.Sequencer @ nn.Recursor @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.GRU(200 -> 200, 0.00)
    (2): nn.Linear(200 -> 800)
    (3): nn.LogSoftMax
  }
}
Language Model B:   
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> output]
  (1): ParallelTable {
    input
      |`-> (1): nn.Sequential {
      |      [input -> (1) -> (2) -> output]
      |      (1): nn.LookupTable
      |      (2): SplitTable
      |    }
       `-> (2): SplitTable
       ... -> output
  }
  (2): ZipTable
  (3): nn.Sequencer @ nn.Recursor @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> (4) -> output]
    (1): ParallelTable {
      input
        |`-> (1): nn.Sequential {
        |      [input -> (1) -> output]
        |      (1): GRU(200 -> 200, 0.00)
        |    }
         `-> (2): Linear(54 -> 200)  <--- Triggers the error!
         ... -> output
    }
    (2): CAddTable
    (3): Linear(200 -> 764)
    (4): nn.LogSoftMax
  }
}

I can successfully run model A using --cuda parameter. However, when I attempt to run model B (the one above), it crashes with the following error:

./model/Linear.lua:69: invalid arguments: CudaTensor number CudaTensor number DoubleTensor CudaTensor 
expected arguments: *CudaTensor~2D* [CudaTensor~2D] [float] CudaTensor~2D CudaTensor~2D | *CudaTensor~2D* float [CudaTensor~2D] float CudaTensor~2D CudaTensor~2D

Both models handle --cuda parameter as:

if opt.cuda then
  model:cuda()
  loss:cuda()
  targetmodule:cuda()
end

So, why does B model crash when A works fine?

tastyminerals commented 7 years ago

I figured out the issue. If anyone encounters such error, it means that somewhere during training or validation you explicitly convert your data to :double() or anything that is not CUDA tensor. I found the line in my code where I do batch:double():squeeze(4), this line reconverts all CUDA tensors back to DoubleTensors and causes this issue. Closing.

tastyminerals commented 7 years ago

Though, I wonder why doesn't nn.SplitTable(1) convert IntTensor to CudaTensor when it passes the data through.