Hi! I've been trying to measure MLM perplexity for TinyBERT model (in particular, tinybert6l), and I keep getting inconsistent results. Looks like the MLM head for TinyBERT is not loaded properly when loading from AutoModelForMaskedLM or by BertForMaskedLM.
Hi! I've been trying to measure MLM perplexity for TinyBERT model (in particular, tinybert6l), and I keep getting inconsistent results. Looks like the MLM head for TinyBERT is not loaded properly when loading from AutoModelForMaskedLM or by BertForMaskedLM.