sniklaus / pytorch-pwc

a reimplementation of PWC-Net in PyTorch that matches the official Caffe version
GNU General Public License v3.0
608 stars 122 forks source link

Regarding CPU implementation of correlation function. #39

Closed Somdyuti2 closed 4 years ago

Somdyuti2 commented 4 years ago

Hi, thanks for the implementation. In my use case, I need to perform the inference on CPU. Inspecting your code in the file correlation.py, I kind of get that for that, we need to call the extern C functions ourselves instead of invoking CuPy functions to do it for us. In your code, each C function is called like this: cupy_launch('kernel_Correlation_rearrange', cupy_kernel('kernel_Correlation_rearrange', { 'input': second, 'output': rbot1 }))( grid=tuple([ int((n + 16 - 1) / 16), second.shape[1], second.shape[0] ]), block=tuple([ 16, 1, 1 ]), args=[ n, second.data_ptr(), rbot1.data_ptr() ] )

I am not familiar with CuPy code, so it will be helpful if you could explain these function calls a bit and give any clue about how to do the equivalent stuff on CPU. I understand that the args in each call are the arguments passed to the C function, but I am not sure what grid and blocksignify here. Probably, they may not be needed when CuPy is not used. As I only need to run on CPU at test time, I guess I don't need to care about the updateGrad functions.

I will appreciate your help/suggestion regarding this.

sniklaus commented 4 years ago

To make inference on CPUs work, you will have to convert the following CUDA code to something that runs on CPUs instead.

https://github.com/sniklaus/pytorch-pwc/blob/cf0d2f2cbef4bcb0f6cbf09011960a4899d77dec/correlation/correlation.py#L8-L33

https://github.com/sniklaus/pytorch-pwc/blob/cf0d2f2cbef4bcb0f6cbf09011960a4899d77dec/correlation/correlation.py#L35-L103

There is nothing you need to be familiar with in terms of CuPy really, the grid and block arguments are something from CUDA. I would recommend you look into the fundamentals for CUDA, it shouldn't take much to understand what the code is doing once you know about the basics of CUDA. Good luck!

duongpaKikai commented 3 years ago

i get problem in the code above when trace model from Pytorch to TorchScript

``Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 449, in compile
nvrtc.compileProgram(self.ptr, options)
File "cupy\cuda\nvrtc.pyx", line 101, in cupy.cuda.nvrtc.compileProgram
File "cupy\cuda\nvrtc.pyx", line 111, in cupy.cuda.nvrtc.compileProgram
File "cupy\cuda\nvrtc.pyx", line 56, in cupy.cuda.nvrtc.check_status
cupy.cuda.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)

During handling of the above exception, another exception occurred:`

` File "D:\project\test_1.0.0.pytorch\DIFRINT\models\correlation\correlation.py", line 143, in cupy_launch
return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 297, in compile_with_cache
return _compile_with_cache_cuda(source, options, arch, cache_dir,
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 350, in _compile_with_cache_cuda
ptx = compile_using_nvrtc(source, options, arch, name + '.cu')
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 158, in compile_using_nvrtc
ptx = prog.compile(options)
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 453, in compile
raise CompileException(log, self.src, self.name, options, 'nvrtc')
cupy.cuda.compiler.CompileException: C:\Users\Admin\AppData\Local\Temp\tmpg65_9di1\8f8f5ff490d72ae331cc951d3896a60d_2.cubin.cu(16): error: identifier "tensor" is undefined

1 error detected in the compilation of "C:\Users\Admin\AppData\Local\Temp\tmpg65_9di1\8f8f5ff490d72ae331cc951d3896a60d_2.cubin.cu".`
jiaweiHu-XDU commented 1 year ago

I would like to ask, is there a PyTorch implementation of this CUDA_C code, because I really haven't changed the Python version, if you have done this work before, I hope you can help me solve this problem