microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 215 forks source link

Fine-tune DeBERTa v3 language model, worthwhile endeavour? #151

Open shensmobile opened 3 months ago

shensmobile commented 3 months ago

Hey everyone, I've been using RoBERTa for the past year or so but have been looking into DeBERTa as well. My typical workflow with RoBERTa is to fine-tune the MLM using ~3mil medical reports to domain adapt before training on down-stream tasks. I've found that this greatly improved performance of the downstream models.

With DeBERTa, I presume that I can't use my existing code for fine-tuning the MLM since DeBERTa doesn't use MLM, it uses RTD. The pre-training scripts here seem to be for training a model from scratch (which I don't think I have good enough data or compute power/time to do efficiently).

I presume that if I wanted to fine-tune the RTD language model, I would use the "deberta-v3-X-continue" option in rtd.sh? If so, do you guys think this would be worth my time? Or should I just fine tune my downstream tasks on the supplied pre-trained models?

StephennFernandes commented 3 months ago

given that you have a significantly good amount of training data, i believe this could be a really good endevour as the DebERTa-v3 architecture and training procedure is insanely great. good h-param search and a nice continual pretraining should give great results. do let me know how it goes.

shensmobile commented 3 months ago

Would I use the deberta-v3-X-continue in rtd.sh or pretrain a model from scratch using my dataset?

StephennFernandes commented 3 months ago

do continual pretraining, i mean use the deberta-v3-X-continue. all medical domain LM are a result of continual pretraining

priamai commented 2 months ago

Hi all, I am in the exact same boat here. What is that rtd.sh is mentioned? I mean I know is a bash file but where is it ? Would be nice to see a python script that shows how the domain adaptation should be run and how to save the model.

fmobrj commented 1 month ago

do continual pretraining, i mean use the deberta-v3-X-continue. all medical domain LM are a result of continual pretraining

Hi, @StephennFernandes. How are you doing? Have you managed to sucessfully pretrain or continue pretraining a deberta V3 model in another language? Back when we were talking, my discriminator couldnt get better.

Best regards, Fabio.