I think there may be sth wrong in xvar = self.running_var. If the input has batch size=1, then this var function will use unbiased function to calculate the variance, since 1-1 = 0, it will result in a divided by 0 error.
I fact, when I ran your sampling code:
# sample from the model
g = torch.Generator().manual_seed(2147483647 + 10)
for _ in range(20):
out = []
context = [0] * block_size # initialize with all ...
while True:
# forward pass the neural net
emb = C[torch.tensor([context])] # (1,block_size,n_embd)
x = emb.view(emb.shape[0], -1) # concatenate the vectors
for layer in layers:
x = layer(x)
logits = x
probs = F.softmax(logits, dim=1)
# sample from the distribution
ix = torch.multinomial(probs, num_samples=1, generator=g).item()
# shift the context window and track the samples
context = context[1:] + [ix]
out.append(ix)
# if we sample the special '.' token, break
if ix == 0:
break
print(''.join(itos[i] for i in out)) # decode and print the generated word
I got an error at BatchNorm1d layer. Since the input x has batch_size=1, the calculated variance is all nan.
Btw, when I used PyTorch's implementation
Hello Andrej, in your note book for makemore part3, you built BatchNorm1d as
I think there may be sth wrong in
xvar = self.running_var
. If the input has batch size=1, then thisvar
function will use unbiased function to calculate the variance, since1-1 = 0
, it will result in a divided by 0 error.I fact, when I ran your sampling code:
I got an error at BatchNorm1d layer. Since the input
x
has batch_size=1, the calculated variance is allnan
. Btw, when I used PyTorch's implementationinstead of
everything was fine.
I know your notebook has been run and tested, so I think there is sth I missed in the video or notebook?