huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.53k stars 26.68k forks source link

Add DistilBERTGeneration comparable to BertGeneration #7397

Open jsilter opened 4 years ago

jsilter commented 4 years ago

🚀 Feature request

I noticed the new BertGeneration class, which uses BERT-style models as both encoder and decoder, as well as the more general EncoderDecoder class. This is all great stuff! It would also be great to be able to use distilled models. I believe this is possible for the encoder, but for the decoder a language head must be added.

Since DistilBert is implemented as its own model, and not as a BertModel, I don't think it's possible (or at least it's not easy) for the end user to do this. At least not loading pretrained models, since any pretrained model needs to be a type approved by AutoModelForCausalLM.

Motivation

Same motivation as using distilled models in general. Same results at higher speed, this time applied to an EncoderDecoder model.

Your contribution

Happy to be an alpha tester for this feature

patrickvonplaten commented 3 years ago

Hey @jsilter - yes we could definitely add a DistilForCausalLM model. I think instead of doing something similar to BertGeneration it would be easier to just add a DistilBertForCausalLM to modeling_distilbert.py similar to BertLMHeadModel or RobertaForCausalLM. This could actually be an interesting Good Second Issue. If someone is interested in opening a PR - I'd be more than happy to provide some guidance :-)

KMFODA commented 3 years ago

Hi @patrickvonplaten, I would love to work on this if it is still possible?

patrickvonplaten commented 3 years ago

Hey @KMFODA - yes absolutely :-) Do you want to open a PR? I think we can very analogues to BertLMHeadModel add a DistilBertForCausalLM model in modeling_distilbert.py.

KMFODA commented 3 years ago

Great! Will open up a PR and start adding a DistilBertForCausalLM model into modeling_distilbert.py and get back to you if I have any issues :)

KMFODA commented 3 years ago

Hi @patrickvonplaten, I've built the DistilBertForCausalLM class into modelling_distilbert.py and can run it on the example used in both the BertLMHeadModel and the RobertaForCausalLM and the outputs look fine. Other than this example, are there any other tests I can run to check it's working as expected?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.