The function convert_to_bwd_base_ops of class FeedForwardOp(Op) only calculates the gradient for updating weight. May I know if there is any reason this function ignores the gradient for the next layer? This implementation of convert_to_bwd_base_ops is showing below:
def convert_to_bwd_base_ops(self):
"""Convert operation to backward base operations"""
self.bwd_base_ops = []
if not self.fwd_base_ops: self.convert_to_fwd_base_ops()
# Incoming gradients are assumed to be in the activation buffer
del_f_size = (self.input_size[0], self.input_size[1], self.ff_weight_size[2])
# Get weight update matrix (del_W = x_[i-1].T * del_i)
ff_op = MatrixMultOp(f'{self.op_name}_f[wgt]', self.config, [], Op.transpose_size(self.input_size), del_f_size, mode='bwd')
self.bwd_base_ops.append(ff_op)
assert self.ff_weight_size == ff_op.output_size()
self.bwd_base_ops.append(MemoryStoreOp(f'{self.op_name}_f[wgt]-s', self.config, self.ff_weight_size, 'weight', overwrite=True))
The function
convert_to_bwd_base_ops
ofclass FeedForwardOp(Op)
only calculates the gradient for updating weight. May I know if there is any reason this function ignores the gradient for the next layer? This implementation of convert_to_bwd_base_ops is showing below: