How does it save computational cost in EMD?

microsoft / DeBERTa

The implementation of DeBERTa

MIT License

1.97k stars 224 forks source link

How does it save computational cost in EMD? #49

Open zhouxincheng opened 3 years ago

zhouxincheng commented 3 years ago

As is showed in the code, EMD just reuses the last layer of the encoder twice, how to get the conclusion of "the additional cost is 0.15 * 2L" in the paper?