smanjil / bert-mask

Mask token prediction BERT
0 stars 0 forks source link

Mask Does Not Work #1

Open BigSalmon2 opened 4 years ago

BigSalmon2 commented 4 years ago

It only predicts the second and eigth token, not the tokens of choice. You can use this in its place, but I do not have the experience to implement it with the HTML and Flask.

import torch
from transformers import BertForMaskedLM, BertTokenizer

model = BertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def predict_masked_text(model, tokenizer, masked_text): 
    input_ids = tokenizer.encode(masked_text, return_tensors='pt') 
    device = next(model.parameters()).device 
    token_logits = model(input_ids.to(device))[0]
    pred_tokens = token_logits.argmax(dim=-1)[0, 1:-1]  # Remove [CLS] and [SEP] included by tokenizer

    return tokenizer.decode(pred_tokens)

masked_text = """In order to [MASK] the [MASK] of"""

predict_masked_text(model, tokenizer, masked_text)
smanjil commented 4 years ago

Hey, for the HTML and Flask implementation, first you have to go through the basics of web app in Flask.

Regarding the token of choice, you can randomly pick up the index to mask or you can manually keep it as you have done there..

Hope this helps!

BigSalmon2 commented 4 years ago

Thank you. If I wanted to [MASK] the token of my choice, could I do this:

text_sentence["<mask>"] = tokenizer.mask_token

text_sentence["<mask>"] = tokenizer.mask_token