Open hertz-pj opened 6 months ago
Regarding the loss calculation part of the AR model, why isn't the mask being handled?
total_loss = F.cross_entropy(logits, targets, reduction=reduction)
Normally, shouldn't it be:
total_loss = F.cross_entropy(logits.mask_selected(y_mask), targets.mask_selected(y_mask), reduction=reduction)
What's the reason for not considering the mask?
Hi, Did you understand why there is no mask in the ar loss?
Regarding the loss calculation part of the AR model, why isn't the mask being handled?
Normally, shouldn't it be:
What's the reason for not considering the mask?