localminimum / QANet

A Tensorflow implementation of QANet for machine reading comprehension
MIT License
982 stars 310 forks source link

mask_logits function #36

Closed shwetgarg closed 6 years ago

shwetgarg commented 6 years ago

I don't understand the purpose of "mask_logits" function, which is being used before calling "softmax" function at various places. Can someone please explain.

localminimum commented 6 years ago

Hi @shwetgarg , I use mask_logits function to prevent from having a wrong softmax output and therefore a wrong gradient introduced by the training sample length difference. Each training sample has different lengths (both paragraphs and questions) and it is inevitable to pad the samples when training in batch. This padding will end up giving you a wrong output if it goes straight into exponential functions such as softmax and mask_logits function prevent the pad from altering the softmax output by adding a large negative value to those padded positions. I hope that explains. For more explanations check out here! Thanks