Closed Somdyuti2 closed 4 years ago
To make inference on CPUs work, you will have to convert the following CUDA code to something that runs on CPUs instead.
There is nothing you need to be familiar with in terms of CuPy really, the grid
and block
arguments are something from CUDA. I would recommend you look into the fundamentals for CUDA, it shouldn't take much to understand what the code is doing once you know about the basics of CUDA. Good luck!
i get problem in the code above when trace model from Pytorch to TorchScript
``Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 449, in compile
nvrtc.compileProgram(self.ptr, options)
File "cupy\cuda\nvrtc.pyx", line 101, in cupy.cuda.nvrtc.compileProgram
File "cupy\cuda\nvrtc.pyx", line 111, in cupy.cuda.nvrtc.compileProgram
File "cupy\cuda\nvrtc.pyx", line 56, in cupy.cuda.nvrtc.check_status
cupy.cuda.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)
During handling of the above exception, another exception occurred:`
` File "D:\project\test_1.0.0.pytorch\DIFRINT\models\correlation\correlation.py", line 143, in cupy_launch
return cupy.cuda.compile_with_cache(strKernel).get_function(strFunction)
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 297, in compile_with_cache
return _compile_with_cache_cuda(source, options, arch, cache_dir,
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 350, in _compile_with_cache_cuda
ptx = compile_using_nvrtc(source, options, arch, name + '.cu')
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 158, in compile_using_nvrtc
ptx = prog.compile(options)
File "C:\ProgramData\Anaconda3\envs\DIFRINT-v4\lib\site-packages\cupy\cuda\compiler.py", line 453, in compile
raise CompileException(log, self.src, self.name, options, 'nvrtc')
cupy.cuda.compiler.CompileException: C:\Users\Admin\AppData\Local\Temp\tmpg65_9di1\8f8f5ff490d72ae331cc951d3896a60d_2.cubin.cu(16): error: identifier "tensor" is undefined
1 error detected in the compilation of "C:\Users\Admin\AppData\Local\Temp\tmpg65_9di1\8f8f5ff490d72ae331cc951d3896a60d_2.cubin.cu".`
I would like to ask, is there a PyTorch implementation of this CUDA_C code, because I really haven't changed the Python version, if you have done this work before, I hope you can help me solve this problem
Hi, thanks for the implementation. In my use case, I need to perform the inference on CPU. Inspecting your code in the file
correlation.py
, I kind of get that for that, we need to call the extern C functions ourselves instead of invoking CuPy functions to do it for us. In your code, each C function is called like this:cupy_launch('kernel_Correlation_rearrange', cupy_kernel('kernel_Correlation_rearrange', { 'input': second, 'output': rbot1 }))( grid=tuple([ int((n + 16 - 1) / 16), second.shape[1], second.shape[0] ]), block=tuple([ 16, 1, 1 ]), args=[ n, second.data_ptr(), rbot1.data_ptr() ] )
I am not familiar with CuPy code, so it will be helpful if you could explain these function calls a bit and give any clue about how to do the equivalent stuff on CPU. I understand that the args in each call are the arguments passed to the C function, but I am not sure what
grid
andblock
signify here. Probably, they may not be needed when CuPy is not used. As I only need to run on CPU at test time, I guess I don't need to care about theupdateGrad
functions.I will appreciate your help/suggestion regarding this.