Closed gbxu closed 2 years ago
We define the GatherGrad as https://github.com/microsoft/nnfusion/blob/31f688f3b0393207dc09389debb943394d5acf81/src/nnfusion/engine/pass/graph/autodiff/gather_v2.cpp#L38-L40
while the input zero is not used in the GatherGrad kernel https://github.com/microsoft/nnfusion/blob/31f688f3b0393207dc09389debb943394d5acf81/src/nnfusion/core/kernels/cuda_gpu/kernels/gather_1d.cpp#L51-L98
zero
the zero op will call the cudamemset API. It'll block other kernels. Is it a wrong op definition? And how could I fix it?
cudamemset
Hi, @gbxu , it is used for zero initialization for atomicAdd in the GatherGrad kernel emitter.
We define the GatherGrad as https://github.com/microsoft/nnfusion/blob/31f688f3b0393207dc09389debb943394d5acf81/src/nnfusion/engine/pass/graph/autodiff/gather_v2.cpp#L38-L40
while the input
zero
is not used in the GatherGrad kernel https://github.com/microsoft/nnfusion/blob/31f688f3b0393207dc09389debb943394d5acf81/src/nnfusion/core/kernels/cuda_gpu/kernels/gather_1d.cpp#L51-L98the
zero
op will call thecudamemset
API. It'll block other kernels. Is it a wrong op definition? And how could I fix it?