microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
959 stars 163 forks source link

[BUG] useless input of GatherGrad #352

Closed gbxu closed 2 years ago

gbxu commented 2 years ago

We define the GatherGrad as https://github.com/microsoft/nnfusion/blob/31f688f3b0393207dc09389debb943394d5acf81/src/nnfusion/engine/pass/graph/autodiff/gather_v2.cpp#L38-L40

while the input zero is not used in the GatherGrad kernel https://github.com/microsoft/nnfusion/blob/31f688f3b0393207dc09389debb943394d5acf81/src/nnfusion/core/kernels/cuda_gpu/kernels/gather_1d.cpp#L51-L98

the zero op will call the cudamemset API. It'll block other kernels. Is it a wrong op definition? And how could I fix it?

xysmlx commented 2 years ago

Hi, @gbxu , it is used for zero initialization for atomicAdd in the GatherGrad kernel emitter.