Closed Achhhe closed 2 years ago
你好,计算attention时需要argmax,而argmax是不可导的,请问反向传播时,如果将梯度回传给LTE啊
argmax indeed cannot backpropagate gradients. The patches chosen by the argmax can backpropagate gradients
你好,计算attention时需要argmax,而argmax是不可导的,请问反向传播时,如果将梯度回传给LTE啊