Solved some compilation warning when using PyTorch 1.7.1 with CUDA 10.2

Thanks for sharing the codes, which are good examples of how to implement and access 4D tensors (like, in size [N, C, H, W]) in a low-level CUDA kernel (e.g., idx=blockIdx.x * blockDim.x + threadIdx.x).

I would like to share the following:

Uncomment the macro definitions CHECK_CUDA, CHECK_CONTIGUOUS, and CHECK_INPUT. Otherwise, you probably get incorrect results when you run the test/test.py. Since the tensors generated by e.g., im0 = torch.FloatTensor(im0).permute(0, 3, 1, 2) are not contiguous. Use im0 = torch.FloatTensor(im0).permute(0, 3, 1, 2).contiguous() instead.
Compilation Deprecation Warning: the solution includes:
- a) In cuda/forward_warp_cuda.cpp file: change "#define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), ..." to "#define CHECK_CUDA(x) TORCH_CHECK(x.is_cuda(), ...";
- b) In cuda/forward_warp_cuda_kernel.cu file: change "AT_DISPATCH_FLOATING_TYPES(im0.type(), ..." to "AT_DISPATCH_FLOATING_TYPES(im0.scalar_type(), ...", and im0.data<scalar_t>() to im0.data_ptr<scalar_t>();

Now I can compile the CUDA code and get the correct results when running test/test.py.

lizhihao6 / Forward-Warp

Solved some compilation warning when using PyTorch 1.7.1 with CUDA 10.2 #4