Support multitoken masked segments

writer / fitbert

Use BERT to Fill in the Blanks

Apache License 2.0

82 stars 14 forks source link

I agree, using BART is more suitable for that when you set match_source_len=False.

Load BART base model

bart = torch.hub.load('pytorch/fairseq', 'bart.base') #takes around two minutes
bart.eval()  # enable evaluation mode
bart.cuda()  # use GPU

Use it:

sentences = ['The <mask> is on the <mask> in front of <mask>.']
bart.fill_mask(sentences, topk=3, beam=10, match_source_len=False)

Gives the following results:

[[('', tensor(-1.5974e-05, device='cuda:0')),
  ('�The photo is on the right in front of the building.',
   tensor(-0.6064, device='cuda:0')),
  ('�The photo is on the right in front of the house.',
   tensor(-0.6113, device='cuda:0'))]]

writer / fitbert

Support multitoken masked segments #19