Closed hscspring closed 4 years ago
As the title mentioned, I'm not sure that should we need to mask the future tokens just like the Transformer did in the Decoder?
I didn't find any answer in the paper or code. Is anyone who knows that? thanks.
As the title mentioned, I'm not sure that should we need to mask the future tokens just like the Transformer did in the Decoder?
I didn't find any answer in the paper or code. Is anyone who knows that? thanks.