microsoft / DeBERTa

The implementation of DeBERTa
MIT License
1.91k stars 215 forks source link

Load deberta-v3-large but got deberta-v2 model #132

Open ChengsongLu opened 1 year ago

ChengsongLu commented 1 year ago

Hi,

model_name = 'microsoft/deberta-v3-large' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(model_name)

When I load the v3 model, it return a V2 model instead, how can I use the v3 model and tokenizer correctlly?

rashmibanthia commented 1 year ago

What you are doing is correct .. you are actually getting v3 weights , there's no DebertaV3Model on Huggingface yet.

Nov05 commented 9 months ago

It seems V3 is the same architecture with V2?

DebertaV2Config {
  "_name_or_path": "microsoft/deberta-v3-base",
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
...

I successfully ran the following code.

import sentencepiece
from transformers import DebertaV2Model, DebertaV2Config, DebertaV2Tokenizer
MODEL_NAME = 'microsoft/deberta-v3-base'
model = DebertaV2Model.from_pretrained(MODEL_NAME)
config = DebertaV2Config.from_pretrained(MODEL_NAME)
tokenizer = DebertaV2Tokenizer.from_pretrained(MODEL_NAME)

Output:

Downloading spm.model: 100%
2.46M/2.46M [00:00<00:00, 22.7MB/s]
Downloading (…)okenizer_config.json: 100%
52.0/52.0 [00:00<00:00, 2.33kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.