class CFuncDef:
[...]
def __call__(self, arg_datas, arg_types, dev_id):
if dev_id is None:
ctx = 'cpu'
else:
set_device(dev_id)
ctx = gpu_ctx_name
# function loader
func = self.loader(self, arg_types, ctx, **self.loader_kwargs)
return func(*arg_datas)
The Python interpreter calls set_device function to call cudaSetDevice in C, then calls the C function.
However, the Deep Learning Framework may set the device between calling the two functions, although cudaSetDevice is thread-safe.
So I merge cudaSetDevice and calling kernel function into a function by adding the two macro, namely KERNEL_RUN_BEGIN and KERNEL_RUN_END.
The argument list of kernel wrapper function becomes MOBULA_DLL void function_name(const int device_id, xxx)
In the old code
mobula/func.py
,The Python interpreter calls
set_device
function to callcudaSetDevice
in C, then calls the C function. However, the Deep Learning Framework may set the device between calling the two functions, althoughcudaSetDevice
is thread-safe.So I merge
cudaSetDevice
and calling kernel function into a function by adding the two macro, namelyKERNEL_RUN_BEGIN
andKERNEL_RUN_END
.The argument list of kernel wrapper function becomes
MOBULA_DLL void function_name(const int device_id, xxx)