How to build and evaluate a vanilla transformer?

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Apache License 2.0

128.95k stars 25.59k forks source link

Model description

"Attention Is All You Need" is a landmark 2017 research paper authored by eight scientists working at Google, responsible for expanding 2014 attention mechanisms proposed by Bahdanau et al. into a new deep learning architecture known as the transformer with an encoder, cross-attention, and a decoder.

Open source status

[X] The model implementation is available
[ ] The model weights are available

Provide useful links for the implementation

EncoderDecoderModels are supported via the huggingface API. Though it isn't possible to evaluate them properly: https://github.com/huggingface/transformers/issues/28721 How is it possible to build and evaluate a vanilla transformer with an encoder, cross-attention, and a decoder in huggingface?

Hi @Bachstelze, thanks for raising an issue!

This is a question best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

Though it isn't possible to evaluate them properly: https://github.com/huggingface/transformers/issues/28721

This isn't quite right - it's not possible to load them through the AutoModelForCausalLM API and hence submit to the open LLM leaderboard. It can still be done manually. If the decoder is loaded with AutoModelForCausalLM, which is done by default, you already have the task specific head. To evaluate the model, you then just need inputs, labels and a metric.

huggingface / transformers