mosaicml / examples

Fast and flexible reference benchmarks
Apache License 2.0
435 stars 124 forks source link

config class for bert is not consistent #438

Open DanielWit opened 10 months ago

DanielWit commented 10 months ago

Hey I am trying to pull the model from huggingface repo using AutoModelForMaskedLM.from_pretrained( 'mosaicml/mosaic-bert-base-seqlen-2048', trust_remote_code=True, revision='b7a0389') (with revision param and without) I am getting the same error that goes like this: ValueError: The model class you are passing has aconfig_classattribute that is not consistent with the config class you passed (model has <class 'transformers.models.bert.configuration_bert.BertConfig'> and you passed <class 'transformers_modules.mosaicml.mosaic-bert-base-seqlen-2048.b7a0389deadf7a7261a3e5e7ea0680d8ba12232f.configuration_bert.BertConfig'>. Fix one of those so they match! Do you have any suggestion as to why this might be the case?

When I do this : BertModel.from_pretrained('mosaicml/mosaic-bert-base-seqlen-2048') It seem to work correctly although I am not sure if the flash attention will work correctly given this statement "This model requires that trust_remote_code=True be passed to the from_pretrained method. This is because we train using FlashAttention (Dao et al. 2022), which is not part of the transformers library and depends on Triton and some custom PyTorch code." in the model card, and class BertModel don't have parameter trust_remote_code.

dakinggg commented 10 months ago

Looks like Hugging Face added some stricter checking at some point. If you go back to the transformers version this model was trained on (4.25.1) auto should work as expected. Otherwise you can load with BertModel as you've done and it should work (assuming you imported BertModel from this repo). I'll also try to get this fixed to work with later transformers versions.

jacobfulano commented 9 months ago

A quick fix here is to do get the config and then pass it in to AutoModelForMaskedLM.from_pretrained

import torch
import transformers
from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
from transformers import BertTokenizer, BertConfig

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # MosaicBERT uses the standard BERT tokenizer

config = transformers.BertConfig.from_pretrained('mosaicml/mosaic-bert-base-seqlen-2048') # the config needs to be passed in
mosaicbert = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-2048',config=config,trust_remote_code=True)

# To use this model directly for masked language modeling
mosaicbert_classifier = pipeline('fill-mask', model=mosaicbert, tokenizer=tokenizer,device="cpu")
mosaicbert_classifier("I [MASK] to the store yesterday.")

We're updating the documentation in https://huggingface.co/mosaicml/mosaic-bert-base