soumith / cudnn.torch

Torch-7 FFI bindings for NVIDIA CuDNN
BSD 2-Clause "Simplified" License
400 stars 159 forks source link

R3: Non-contiguous tensors #50

Open mys007 opened 9 years ago

mys007 commented 9 years ago

The requirements for contiguous tensors seem to be too strict with R3. Actually, cudnn supports operations on non-contiguous data well; there are just some constraints, typically that input and gradInput and output and gradOutput needs to have the same strides. Thus, I was able to extend Pointwise (ReLU) to work with non-contiguous tensors. Basically one just needs to create personalized descriptors for every tensor and not share them.

soumith commented 9 years ago

this is true. I think I can relax this constraint more.

soumith commented 8 years ago

i will fix it when I get time. In the meanwhile, pull requests are welcome :)

soumith commented 8 years ago

From NVIDIA: 3D support has been added for all layers in CuDNN v3 RC. The story with non-contiguous tensors is somewhat complicated. Short answer is, cuDNN will return CUDNN_STATUS_NOT_SUPPORTED if you attempt to call some routine with the tensor format that is does not support.
Support matrix for the padding/transposition is as follows:

                 SUPPORT                                              OPTIMIZED
FORWARD:

Algo0 :              all                                                NCHW   , W-packed
Algo1                all                                                 NCHW  , W-packed
Algo2               all                                                 ?
FFT          NCHW HW-packed                            NCHW HW-packed

WGRAD
Algo0            NCHW CHW-packed
Algo1            NCHW CHW packed
FFT               NCHW HW-packed

DGRAD
Algo0          NCHW CHW packed
Algo1           NCHW CHW packed
FFT               NCHW HW-packed

Meaning that to get the best performance on gemm-based forward propagation you want to have NCHW contiguous tensor. Transposition/padding is supported, but performance is not guaranteed. FFT for both forward and backprop supports padding in C and N dimensions, but no transpositions. CHW-packed means that you can not have transpositions and padding in C,H,W dimensions, but can have padding in N (outermost) dimension. Non-convolutional operators should support any strides for input and output, please file a bug if they do not.