microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.97k stars 224 forks source link

Sharing DeBERTa-v3 discriminator and generator with task-specific heads? #89

Closed MoritzLaurer closed 1 year ago

MoritzLaurer commented 2 years ago

Really like your v3 models a lot, they bring great performance with reasonable size.

Are the discriminator and generator versions of the v3 models available somewhere? The v3 models on the HF hub provide only the last hidden state as output, but not the electra-style predictions over each token. For Google's original ELECTRA, I can download the discriminator/generator directly (https://huggingface.co/google/electra-small-discriminator and https://huggingface.co/google/electra-small-generator) with the respective task-specific heads which provide the token level predictions.

See also this post with code to reproduce the issue: https://discuss.huggingface.co/t/deberta-v3-how-to-keep-electra-style-task-head/14622

I'm doing research on the RTD prediction task itself and it would be great if you could share these model versions with the respective task-heads via the HF hub too!

MoritzLaurer commented 2 years ago

Hey, it would be really great to have both the raw discriminator and generator models uploaded on the HF hub. There is a lot of great research that could be done with the raw models, could you please upload them?

Opdoop commented 2 years ago

Hi, @MoritzLaurer. It's been a long time since you propose this issue, and these headers haven't been released. Do you find some alternative way? Is there some other pre-trained discriminator and generator available?

MoritzLaurer commented 2 years ago

No, didn't find a solution yet unfortunately. Alternative discriminators and generators are Google's ELECTRA: https://huggingface.co/models?sort=downloads&search=google%2Felectra They are just not as good/new as DeBERTaV3

Opdoop commented 2 years ago

☺Thanks for the quick response! I also find ELECTRA. Is ELECTRA only have TensorFlow implementation?

MoritzLaurer commented 2 years ago

I think via huggingface you can use it both in TensorFlow and PyTorch

Opdoop commented 2 years ago

Oh! I find deberta-v3-xsmall have pytorch_model.generator.bin. Maybe we can use this one?

BigBird01 commented 1 year ago

Thanks for your interesting to our work. We updated our code with pre-training/continuous training of DeBERTaV3. Please check the document for details.

MoritzLaurer commented 1 year ago

@BigBird01, great, thank you very much for adding this! In the new document, I see that you've uploaded the generator for the xsmall, small and large models. Could you also add the generators and training code for -base and mDeBERTa-base? These two base models are arguably the most useful ones

chg0901 commented 8 months ago

can I ask how to use this generator? I have no idea to use it with the old debertaV3

ddofer commented 8 months ago

Is the generator/RTD supported with Huggingface? The model pages there don't mention it (the sample code only has MLM), and HF pipelines lacks an RTD task