Why do we need to apply mask while fine tuning?

openai / finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

MIT License

2.14k stars 499 forks source link

Why do we need to apply mask while fine tuning? #43

Open pranoy-k opened 5 years ago

pranoy-k commented 5 years ago

n attention class, you have the following code for masking. I understand the logic for pre training, but in fine tuning if we dont include language model loss we should have a check here for not applying the mask. Do we have to always apply the masking because the model was trained that way, is there an intuitive idea for this, because I dont see a necessity to do it experimentally