How to use pretrained models from AutoModelForSeq2SeqLM (google/flan-t5-xxl)

DavidAdamczyk commented 1 year ago

I would like to ask you how I can use model from HF:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xxl")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-xxl")

I installed the latest diagNNose from git (version 1.20) and this is unclear to me how I can use that model:

from diagnnose.models import LanguageModel, import_model
config_dict["model"]["transformer_type"] = "google/flan-t5-xxl"
model = import_model(**config_dict["model"])

I read the diagnnose/models/huggingface_lm.py and this seems to me that models are not supported. Can you suggest how it is possible to load these model?

The error message is:

File ~/miniconda3/lib/python3.9/site-packages/diagnnose/models/huggingface_lm.py:40, in HuggingfaceLM.load_model(self, transformer_type, mode, cache_dir)
     36 auto_model = mode_to_auto_model.get(mode, AutoModel)
     38 self.is_causal = mode == "causal_lm"
---> 40 return auto_model.from_pretrained(transformer_type, cache_dir=cache_dir)

File ~/miniconda3/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:470, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    466     model_class = _get_model_class(config, cls._model_mapping)
    467     return model_class.from_pretrained(
    468         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    469     )
--> 470 raise ValueError(
    471     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    472     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    473 )

ValueError: Unrecognized configuration class <class 'transformers.models.t5.configuration_t5.T5Config'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.

jumelet commented 1 year ago

Hi David,

Cool that you're planning to use diagnnose! During the development of diagnnose a few years back I never really focused on seq2seq models, mainly on Causal LMs. But it shouldn't be hard to incorporate that, what diagnnose utility were you planning to use?

jumelet commented 1 year ago

If you only intend to use the activation extraction utility you may want to look into minicons as well: https://github.com/kanishkamisra/minicons

DavidAdamczyk commented 1 year ago

Thank you for the response @jumelet 👍🏻
I decided to use llama model instead of flan-t5. May I ask how I can determine what llama model from HF transformers is supported by diagNNose? Or can you suggest a particular llama model? Or maybe this is question for @dieuwkehupkes or @LorianColtof ?

i-machine-think / diagNNose

How to use pretrained models from AutoModelForSeq2SeqLM (google/flan-t5-xxl) #89