Load an EncoderDecoderModel as AutoModel

Bachstelze commented 9 months ago

System Info

transformers version: 4.35.0
Platform: Linux-5.15.0-91-lowlatency-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.17.3
Safetensors version: 0.4.0
Accelerate version: 0.24.1
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu121 (False)
Tensorflow version (GPU?): 2.11.0 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Who can help?

@ArthurZucker and @younesbelkada

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Bachstelze/instructionRoberta-base") model = AutoModel.from_pretrained("Bachstelze/instructionRoberta-base", output_attentions=True)

Expected behavior

Load the EncoderDecoderModel as AutoModel. "BertGenerationConfig" is supported, though this seems outdated.

amyeroberts commented 9 months ago

Hi @Bachstelze, thanks for raising an issue!

The EncoderDecoder models are composite models which use AutoModel to load the encoder and decoder respectively. As per the BertGeneration docs, you can load the model using:

from transformers import EncoderDecoderModel
model = EncoderDecoderModel.from_pretrained("Bachstelze/instructionRoberta-base", output_attentions=True)

Bachstelze commented 9 months ago

@amyeroberts Yes it is possible to load it as EncoderDecoderModel, though many libraries load generic models just with the Automodel, so EncoderDecoderModels yield an error.

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Bachstelze commented 8 months ago

Given that the EncoderDecoderModel already uses the AutoModel internally, it should be possible to configure it also as AutoModel. Or isn't it possible? @amyeroberts

amyeroberts commented 8 months ago

Hi @Bachstelze, it doesn't make sense to load this way. AutoModel is used to load individual models defined in MODEL_MAPPING_NAMES in modeling_auto.py. EncoderDecoder is a composite model, allowing you to combine encoder and decoder models but isn't a defined model within our library e.g. like BERT.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Bachstelze commented 7 months ago

@amyeroberts Isn't it possible to add the the EncoderDecoder into modeling_auto.py? Otherwise, we have to rewrite all the other libraries that use huggingface to support this model type.

amyeroberts commented 6 months ago

@Bachstelze If there was an addition of the model to modeling_auto.py, this would mean adding a class AutoEncoderDecoder class - which would be the same in terms of code as using the EncoderDeocderModel class.

Perhaps you can elaborate a bit more on why this is needed? One thing that might be useful to know is that the architecture to load the model can be found in the model's config.

Bachstelze commented 6 months ago

@amyeroberts thanks for the reply! Many external libraries just use the AutoModel to initialize the model class. They all need to be extended to support the EncoderDeocderModel or we allow the loading as AutoModel once in the huggingface lib.

Bachstelze commented 6 months ago

@amyeroberts One concrete example is 🤗 Open LLM Leaderboard. It requires the model to be loaded as AutoModel: "Make sure you can load your model and tokenizer using AutoClasses" How can this be achieved or should the leaderboard be extended?

amyeroberts commented 6 months ago

@Bachstelze OK, I see. Thanks for providing an example.

So, it doesn't make sense to load a custom encoder-decoder with the AutoModel API, as the default model created is a causal LM. This is actually a good thing regarding the leaderboard - as it uses AutoModelForCausalLM to load the models (rather than AutoModel).

However, we still shouldn't put the encoder-decoder model into the automapping because it's ultimately too flexible. It's possible to pass in a decoder with any task head that you want. Models which can be loaded in an auto class e.g. AutoModelForCausalLM should all perform the same task, take the same inputs and return the same outputs.

If you'd like to have your model evaluated according to the Open LLM leaderboard, you can use the lighteval library to get the same results.

Bachstelze commented 6 months ago

@amyeroberts This seems theoretical possible, yet the lighteval library is still unstable and buggy https://github.com/huggingface/lighteval/issues/183

clefourrier commented 6 months ago

If you need to evaluate a model following the same steps as the leaderboard, you can actually use the lm_eval harness from Eleuther, following the steps in the Reproducibility section of the About tab. lighteval is still an alpha, so we don't guarantee exhaustiveness or stability for now, but we're doing our best to fix issues as they arrive.

Bachstelze commented 6 months ago

@clefourrier which "steps in the Reproducibility section of the About tab" do you mean? I couldn't run lm-evaluation-harness with the description in the ReadMe.

@ How can an encoder-decoder-model be loaded as AutoModelForCausalLM or AutoModelForSeq2SeqLM?

clefourrier commented 6 months ago

If you'd like to have your model evaluated according to the Open LLM leaderboard, ...

If you go on the Open LLM Leaderboard page, there is a tab called "About", which gives all the steps to allow for reproduction of the results (for AutoModelForCausalLM models only, however).

huggingface / transformers