jha-lab / acceltran

[TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers
BSD 3-Clause "New" or "Revised" License
33 stars 8 forks source link

Lack of gradient calculation for Feedforward layer in ops.py #11

Open FishSeeker opened 7 months ago

FishSeeker commented 7 months ago

The function convert_to_bwd_base_ops of class FeedForwardOp(Op) only calculates the gradient for updating weight. May I know if there is any reason this function ignores the gradient for the next layer? This implementation of convert_to_bwd_base_ops is showing below:

def convert_to_bwd_base_ops(self): """Convert operation to backward base operations""" self.bwd_base_ops = []

  if not self.fwd_base_ops: self.convert_to_fwd_base_ops()

  # Incoming gradients are assumed to be in the activation buffer
  del_f_size = (self.input_size[0], self.input_size[1], self.ff_weight_size[2])

  # Get weight update matrix (del_W = x_[i-1].T * del_i)
  ff_op = MatrixMultOp(f'{self.op_name}_f[wgt]', self.config, [], Op.transpose_size(self.input_size), del_f_size, mode='bwd')
  self.bwd_base_ops.append(ff_op) 
  assert self.ff_weight_size == ff_op.output_size()

  self.bwd_base_ops.append(MemoryStoreOp(f'{self.op_name}_f[wgt]-s', self.config, self.ff_weight_size, 'weight', overwrite=True))