Open HenryZ5734 opened 1 year ago
Thank you for pointing out my problem. In order to treat a batch of data as one, the cross-entropy function needs to take the average of the batches of data, so it also needs to divide by batchsize when back-propagating.
My implementation of softmax.backward is below: def backward(self, label): self.delta = self.softmax.copy()
start your code