microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 215 forks source link

Where is the Gradient-Disentangled Embedding Sharing(GDES) part in the code? #111

Closed Cakeyan closed 1 year ago

Cakeyan commented 1 year ago

Hi, it's really a great job! But I have a few questions. I can't find the GDES part in the exsiting code, which I think is important in the DeBERTaV3 paper. Could you tell me where is this part in the code or how to implement it? Thanks!

stefan-it commented 1 year ago

See ongoing discussion here: https://github.com/microsoft/DeBERTa/issues/93#issuecomment-1173101061

Spoiler: we are all waiting for it 😅

Cakeyan commented 1 year ago

@stefan-it Thanks a lot. 🤣So keep waiting...

BigBird01 commented 1 year ago

updated.