Open jsilter opened 4 years ago
Hey @jsilter - yes we could definitely add a DistilForCausalLM
model. I think instead of doing something similar to BertGeneration
it would be easier to just add a DistilBertForCausalLM
to modeling_distilbert.py
similar to BertLMHeadModel
or RobertaForCausalLM
. This could actually be an interesting Good Second Issue
. If someone is interested in opening a PR - I'd be more than happy to provide some guidance :-)
Hi @patrickvonplaten, I would love to work on this if it is still possible?
Hey @KMFODA - yes absolutely :-) Do you want to open a PR? I think we can very analogues to BertLMHeadModel
add a DistilBertForCausalLM
model in modeling_distilbert.py
.
Great! Will open up a PR and start adding a DistilBertForCausalLM
model into modeling_distilbert.py
and get back to you if I have any issues :)
Hi @patrickvonplaten, I've built the DistilBertForCausalLM
class into modelling_distilbert.py
and can run it on the example used in both the BertLMHeadModel
and the RobertaForCausalLM
and the outputs look fine. Other than this example, are there any other tests I can run to check it's working as expected?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hello @patrickvonplaten! Would this still be a valued contribution? I'm currently a graduate student at CU Boulder studying NLP, Formal Methods and Program Synthesis, and this issue sounds interesting to me.
🚀 Feature request
I noticed the new
BertGeneration
class, which uses BERT-style models as both encoder and decoder, as well as the more generalEncoderDecoder
class. This is all great stuff! It would also be great to be able to use distilled models. I believe this is possible for the encoder, but for the decoder a language head must be added.Since DistilBert is implemented as its own model, and not as a BertModel, I don't think it's possible (or at least it's not easy) for the end user to do this. At least not loading pretrained models, since any pretrained model needs to be a type approved by
AutoModelForCausalLM
.Motivation
Same motivation as using distilled models in general. Same results at higher speed, this time applied to an
EncoderDecoder
model.Your contribution
Happy to be an alpha tester for this feature