microsoft / ProphetNet

A research project for natural language generation, containing the official implementations by MSRA NLC team.
MIT License
686 stars 109 forks source link

What's the difference between using bi-gram directly and the proposed loss function? #18

Open nickcom007 opened 4 years ago

nickcom007 commented 4 years ago

When n=2, why not use bi-gram directly for the loss? It will save a lot of computation cost. What is the difference if all weights αn = 0?