Dimension Padding problem in reduction_ops.cc

Describe the issue

Per https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnSetTensorNdDescriptor, document says:" Tensors are restricted to having at least 4 dimensions", but here ONNXRuntime only pads one if rank is 2, to make tensor descriptor becomes 3D. But anyway, nvidia support least 3D input, so this way also works. My question is, In line: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/reduction/reduction_ops.cc#L201, if original Rank is 2, then ONNXRuntime will pad one dimension in the end. But normally, if rank is 2, which is a 2D tensor, then it indicates Height and Width. Normally, 4D tensor in cudnn, each dimension indicates N, C, H, W. So if we want do padding, shouldn't we pad at the begining of the dimension? For example in test case _CudaKernelTest.SoftmaxCrossEntropyTinySizeTensor, original shape of 2D tensor is {8, 2}, it should be more reasonable to become {1, 8 ,2 } after padding. Instead, shape becomes {8, 2, 1} after padding. The padding dimension becomes C if pad once, and padding dimension becomes N, C if pad twice. Finally, cudnn view it as a 4D tensor in NCHW format. in test case _CudaKernelTest.SoftmaxCrossEntropyTinySizeTensor, it does not matter the padding position because we reduce along all dimensions, so it won't influence the computing results.

To reproduce

./onnxruntime_test_all --gtest_filter=CudaKernelTest.SoftmaxCrossEntropy_TinySizeTensor under build directory.

Urgency

No response

Platform

Linux

OS Version

CentOS 7

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

C++

Architecture

X86

Execution Provider

CUDA

Execution Provider Library Version

CUDA 10.2

microsoft / onnxruntime