Size mismatch errors ignored: THCudaTensor_pointwiseApply* retvals need checking

wickedfoo commented 9 years ago

None of the cunn modules that use THCudaTensor_pointwiseApply2 or THCudaTensor_pointwiseApply3 check the retvals, and so size inconsistency errors are silently ignored.

The functions return false if the tensor has too many dimensions, or if the sizes of the tensors mismatch.

I think there was a reason at the time why the pointwise functions did not throw Lua errors (I think it was because the caller had more contextual information about what the error should be, instead of a non-descriptive size mismatch). Either the retvals should be checked in cunn, or we should add Lua errors in cutorch's pointwiseApply*.

https://github.com/torch/cunn/search?utf8=%E2%9C%93&q=THCudaTensor_pointwiseApply2&type=Code https://github.com/torch/cunn/search?utf8=%E2%9C%93&q=THCudaTensor_pointwiseApply3&type=Code

> require 'cunn'
true
                                                                      [2.1727s]
> cpuModule = nn.ReLU(false)
                                                                      [0.0000s]
> gpuModule = nn.ReLU(false):cuda()
                                                                      [0.0002s]
> cpuInput = torch.DoubleTensor(8):fill(-1)
                                                                      [0.0000s]
> cpuGradOutput = torch.DoubleTensor(9):uniform()
                                                                      [0.0000s]
> gpuInput = cpuInput:cuda()
                                                                      [0.1500s]
> gpuGradOutput = cpuGradOutput:cuda()
                                                                      [0.0003s]
> cpuModule:updateOutput(cpuInput)                                                                                                          
 0
 0
 0
 0
 0
 0
 0
 0
[torch.DoubleTensor of size 8]

> cpuModule:updateGradInput(cpuInput, cpuGradOutput)
...plearning/torch/cuth.llar.linktree/_lua/nn/Threshold.lua:26: inconsistent tensor size at torch/oss/nn/generic/Threshold.c:48
stack traceback:
        ...learning/torch/cuth.llar.linktree/_lua/fb/util/error.lua:76: in function <...learning/torch/cuth.llar.linktree/_lua/fb/util/error
.lua:72>
        [C]: in function 'Threshold_updateGradInput'
        ...plearning/torch/cuth.llar.linktree/_lua/nn/Threshold.lua:26: in function 'updateGradInput'
        [string "_RESULT={cpuModule:updateGradInput(cpuInput, ..."]:1: in function 'inner_func'
        ...learning/torch/cuth.llar.linktree/_lua/fb/trepl/init.lua:492: in function <...learning/torch/cuth.llar.linktree/_lua/fb/trepl/ini
t.lua:492>
...

> gpuModule:updateOutput(gpuInput)
 0
 0
 0
 0
 0
 0
 0
 0
[torch.CudaTensor of size 8]

                                                                      [0.0006s]
> gpuModule:updateGradInput(gpuInput, gpuGradOutput)
 0
 0
 0
 0
 0
 0
 0
 0
[torch.CudaTensor of size 8]

                                                                      [0.0004s]

hughperkins commented 9 years ago

Lua errors are more consistent, and easier to maintain. Since they provide a stack-trace, it's fairly easy to diagnose them.

dominikgrewe commented 9 years ago

I feel like in practice the caller of pointwiseApply simply reports an error with CUTORCH_DIM_WARNING, so maybe pointwiseApply actually has more information on what went wrong. Ideally we'd combine the two, but I guess we don't have a good mechanism for that.

So I'd also be in favour of Lua errors.

wickedfoo commented 9 years ago

Ah, actually from what I recall when writing it the intent was that the API shouldn't depend upon running in a Lua interpreter, but that battle (having a tensor manipulation API that could be used outside of a Lua environment) was lost in Torch a long time ago.

The only specific information that the caller has is the arguments passed to the function, and what index they were. But it's quite ambiguous when there are 2+ arguments, the DIM_WARNING error doesn't say anything at all about size mismatch, and returning more error status to indicate exactly what argument it didn't like and why it didn't like it is just as fragile as the current situation.

I think I'll just add the Lua error throwing in the pointwiseApply* functions for specific failure cases, and remove the retval checking in cutorch itself.

dominikgrewe commented 9 years ago

I suppose you can use THError to not be Lua-specific.

torch / cunn

Size mismatch errors ignored: THCudaTensor_pointwiseApply* retvals need checking #115