Closed savasy closed 4 years ago
Hi @savasy ,
for mask filling you need a special head (language modeling head) ontop the BERT model:
https://huggingface.co/transformers/model_doc/bert.html#bertformaskedlm
When using the Auto*
class it is the AutoModelWithLMHead
class. Here's a full working example:
from transformers import AutoModelWithLMHead, AutoTokenizer, pipeline
model_name = "dbmdz/bert-base-turkish-cased"
model = AutoModelWithLMHead.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
nlp = pipeline("fill-mask", model=model, tokenizer=tokenizer)
nlp("merhaba ben [MASK] iyiyim")
The masking token for BERT is [MASK]
, this can be checked with tokenizer.mask_token
:)
Then the example should work:
[{"sequence": "[CLS] merhaba ben çok iyiyim [SEP]",
"score": 0.4122845530509949,
"token": 2140},
{"sequence": "[CLS] merhaba ben daha iyiyim [SEP]",
"score": 0.13173197209835052,
"token": 2171},
{"sequence": "[CLS] merhaba ben gayet iyiyim [SEP]",
"score": 0.12043964117765427,
"token": 7982},
{"sequence": "[CLS] merhaba ben oldukça iyiyim [SEP",
"score": 0.03267306089401245,
"token": 3523},
{"sequence": "[CLS] merhaba ben gerçekten iyiyim [SEP]",
"score": 0.03199344128370285,
"token": 4036}]
Aahh sorry, I used wrong Model and wrong mask token, shame on me Thank you @stefan-it , I appreciate it
When I apply fill-masl with bert-base-turkish as follows:
I get following error
ValueError Traceback (most recent call last)