jbrry / Irish-BERT

Repository to store helper scripts for creating an Irish BERT model.
Other
9 stars 0 forks source link

Support multiple MASK tokens in LM inspector #63

Open jowagner opened 3 years ago

jowagner commented 3 years ago

inspect_lm_huggingface.py now has an option to repeat [MASK] tokens but this doesn't work due to https://github.com/huggingface/transformers/issues/3609

We could implement our own solution using AutoModelWithLMHead, following suggestions in my comment in the above transformer issue, or implement a solution inside the transformer library and make a PR.

Also look at FitBERT, SpanBERT and other tools that may already have implemented this.

Meng et al. 2022 Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models propose a workaround for obtaining multi-token answers from BERT.

Edit: