Open rajeevbaalwan opened 4 years ago
I can understand masking is necessary for decoder during training to hide future tokens from decoder but why do we need masking during inference in decoder ?
I can understand masking is necessary for decoder during training to hide future tokens from decoder but why do we need masking during inference in decoder ?