microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.99k stars 228 forks source link

Is the Decoder like the Transformer Decoder, or just a layer? #10

Closed hscspring closed 4 years ago

hscspring commented 4 years ago

As the title mentioned, I'm not sure that should we need to mask the future tokens just like the Transformer did in the Decoder?

I didn't find any answer in the paper or code. Is anyone who knows that? thanks.