Open zhoupingjay opened 1 year ago
More analysis: I think the root cause is here:
def generate(self, idx, max_new_tokens):
# idx is (B, T) array of indices in the current context
for _ in range(max_new_tokens):
logits, loss = self(idx)
# focus only on the last time step
logits = logits[:, -1, :] # becomes (B, C)
# apply softmax to get probabilities
probs = F.softmax(logits, dim=-1) # (B, C)
# sample from the distribution
idx_next = torch.multinomial(probs, num_samples=1) # (B, 1)
We use the output from Embedding directly as "logits", which implies that each dimension of the embedding is the probability of one "class". So this essentially requires that the number of dimensions to be same as the number of classes (vocab_size
). If we set number of dimensions to be larger than vocab_size
(e.g. 128), the next token index (idx_next
) could be larger than vocab_size
, resulting in "out of index" error.
https://github.com/karpathy/ng-video-lecture/blob/52201428ed7b46804849dea0b3ccf0de9df1a5c3/bigram.py#L66
If I change the 2nd parameter (dimension of the embedding) to something different than
vocab_size
(e.g. 128), I got "index out of range error" ingenerate()
.To replicate the error, just change this line in the notebook:
And then rerun the cell: