mask_logits function - Githubissues

Hi @shwetgarg , I use mask_logits function to prevent from having a wrong softmax output and therefore a wrong gradient introduced by the training sample length difference. Each training sample has different lengths (both paragraphs and questions) and it is inevitable to pad the samples when training in batch. This padding will end up giving you a wrong output if it goes straight into exponential functions such as softmax and mask_logits function prevent the pad from altering the softmax output by adding a large negative value to those padded positions. I hope that explains. For more explanations check out here! Thanks

localminimum / QANet

mask_logits function #36