Open dewijones92 opened 1 month ago
The loss calculation in the code was causing a shape mismatch error due to
inconsistent tensor shapes. The error occurred because the entire Y tensor
Y
was being used to index the prob tensor, which had a different shape.
prob
The original line of code:
loss = -prob[torch.arange(32), Y].log().mean()
was causing the issue because:
torch.arange(32) creates a tensor of indices from 0 to 31, assuming a fixed
torch.arange(32)
batch size of 32. However, the actual batch size might differ.
Y refers to the entire label tensor, which has a shape of (num_samples,),
where num_samples is the total number of samples in the dataset.
Using the entire Y tensor to index prob resulted in a shape mismatch because
prob has a shape of (batch_size, num_classes), where batch_size is the number
of samples in the current minibatch and num_classes is the number of possible
output classes.
To fix this issue, the line was modified to:
loss = -prob[torch.arange(prob.shape[0]), Y[ix]].log().mean()
The changes made:
torch.arange(prob.shape[0]) creates a tensor of indices from 0 to batch_size-1,
torch.arange(prob.shape[0])
dynamically adapting to the actual batch size of prob.
Y[ix] retrieves the labels corresponding to the current minibatch indices ix,
Y[ix]
ix
ensuring that the labels align correctly with the predicted probabilities in prob.
By using Y[ix] instead of Y, the shapes of the indexing tensors match during the
loss calculation, resolving the shape mismatch error. The model can now be trained
and evaluated correctly on the given dataset.
These changes were necessary to ensure the correct calculation of the loss for each
minibatch, enabling the model to learn from the appropriate labels and improve its
performance.
Fixes https://github.com/karpathy/nn-zero-to-hero/issues/50
The loss calculation in the code was causing a shape mismatch error due to
inconsistent tensor shapes. The error occurred because the entire
Y
tensorwas being used to index the
prob
tensor, which had a different shape.The original line of code:
loss = -prob[torch.arange(32), Y].log().mean()
was causing the issue because:
torch.arange(32)
creates a tensor of indices from 0 to 31, assuming a fixedbatch size of 32. However, the actual batch size might differ.
Y
refers to the entire label tensor, which has a shape of (num_samples,),where num_samples is the total number of samples in the dataset.
Using the entire
Y
tensor to indexprob
resulted in a shape mismatch becauseprob
has a shape of (batch_size, num_classes), where batch_size is the numberof samples in the current minibatch and num_classes is the number of possible
output classes.
To fix this issue, the line was modified to:
loss = -prob[torch.arange(prob.shape[0]), Y[ix]].log().mean()
The changes made:
torch.arange(prob.shape[0])
creates a tensor of indices from 0 to batch_size-1,dynamically adapting to the actual batch size of
prob
.Y[ix]
retrieves the labels corresponding to the current minibatch indicesix
,ensuring that the labels align correctly with the predicted probabilities in
prob
.By using
Y[ix]
instead ofY
, the shapes of the indexing tensors match during theloss calculation, resolving the shape mismatch error. The model can now be trained
and evaluated correctly on the given dataset.
These changes were necessary to ensure the correct calculation of the loss for each
minibatch, enabling the model to learn from the appropriate labels and improve its
performance.
Fixes https://github.com/karpathy/nn-zero-to-hero/issues/50