MLM Pre-training Code Version

robinsongh381 commented 2 years ago

Hello ! Thank you for sharing a great piece of work.

I was wondering whether the MLM pre-training code is for training DeBERTa v3 or v2 ? (or v1)

Regards

R4ZZ3 commented 2 years ago

I am also waiting for v3 pretraining codes and would absolutely LOVE to see it integrated to huggingface FLAX !

BigBird01 commented 2 years ago

It's for v1 &v2, we are still working on the v3 code, will release it once we passed the internal processing.

Oxi84 commented 2 years ago

It's for v1 &v2, we are still working on the v3 code, will release it once we passed the internal processing.

I do not get it. Why do you work on tis models when they just gives random predictions. You do not need to work on models at all, it would me much faster to just to replace [MASK] with random(vocabulary) and you would get the same results with the same extremely bad accuracy.

You can at least make unigram model of this data and it would be much more accurate this random model.

 from transformers import pipeline
 unmasker = pipeline('fill-mask', model='deberta-base')
 the_out = unmasker("The capital of France is [MASK].")
 print("the_out",the_out)

As you can see the deberta results is completely wrong, there is some big error in porting it to transformers.

the_out [{'score': 0.001861382625065744, 'token': 18929, 'token_str': 'ABC', 'sequence': 'The capital of France isABC.'}, {'score': 0.0012871784856542945, 'token': 15804, 'token_str': ' plunge', 'sequence': 'The capital of France is plunge.'}, {'score': 0.001228992477990687, 'token': 47366, 'token_str': 'amaru', 'sequence': 'The capital of France isamaru.'}, {'score': 0.0010126306442543864, 'token': 46703, 'token_str': 'bians', 'sequence': 'The capital of France isbians.'}, {'score': 0.0008897537481971085, 'token': 43107, 'token_str': 'insured', 'sequence': 'The capital of France isinsured.'}]

@BigBird01

microsoft / DeBERTa

MLM Pre-training Code Version #75