karpathy / nn-zero-to-hero

Neural Networks: Zero to Hero
MIT License
10.9k stars 1.33k forks source link

a new way of backprob of C[Xb] instead of for loop #55

Open haduoken opened 3 weeks ago

haduoken commented 3 weeks ago

@karpathy when I'm watching your zero_to_hero serial in youtube (and I think it's awesome) I come up with an new idea of backprob of C[Xb], (I have reply in youtube as well)

the original method like this : dC = torch.zeros_like(C) for k in range(Xb.shape[0]): for j in range(Xb.shape[1]): ix = Xb[k,j] dC[ix] += demb[k,j]

my method like this dC = (F.one_hot(Xb).float().transpose(1, 2) @ demb).sum(0)

and I check that the grad is matched image

that woks because we can convert the index format to an one_hot with matrix multiple @, then we can just use the backprob rule as the matrix multiple