huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.17k stars 26.59k forks source link

How to make some structural changes to the EncoderDecoderModel ? #7979

Closed yhznb closed 3 years ago

yhznb commented 3 years ago

❓ Questions & Help

Details

Hey , I use EncoderDecoderModel for abstractive summarization. I load the bert2bert model like this model=EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased')

And I want to make some structural changes to the output layer of decoder model.

For example, in one decoder step, the output hidden state of bert-decoder is a vector (s). I use another network and I get a vector (w) to make the summarization more accurate. I want to concatenate the two vectors in the output layer and use the final vector to generate a word in the vocabulary.

How can I do this ?

A link to original question on the forum/Stack Overflow:

patrickvonplaten commented 3 years ago

Hey @yhznb,

We try to mainly use the github issues for bugs in the library. For more customized questions it would be great if you could use https://discuss.huggingface.co/ instead.

Regarding your question I would just add a layer to BertLMHeadModel wherever you want to and then build your EncoderDecoderModel from BertModel (encoder) & your use-case speciifc BertLMHeadModel (decoder).

AI678 commented 3 years ago

Hey, @patrickvonplaten, I have the same question. Can you provide a example of building the EncoderDecoderModel from BertModel (encoder) & use-case speciifc BertLMHeadModel ? I can't find this in the official document. Thank you very much .

AI678 commented 3 years ago

I think the model(EncoderDecoderModel) outputs all the hidden states at once . And I want to control it step by step. For example , I want to change the LMhead of Decoder by concatenating another vector. The problem is that the DecoderModel outputs all the hidden states at once. I want to control it for step by step decoding. In other words. I want to use the concatenated vector as the hidden state for generation and use the generated word vector for next step's input. How can I change the model or call the interface properly ? Is it possible under the framework of huggingface ? Thank you very much ! @patrickvonplaten

AI678 commented 3 years ago

I also raised this in the forum. Does this issue need to be closed ? The link is here : https://discuss.huggingface.co/t/control-encoderdecodermodel-to-generate-tokens-step-by-step/1756

yhznb commented 3 years ago

thank you very much ! @patrickvonplaten

yhznb commented 3 years ago

Have you solved your question ? @AI678 I think it is all about changing the LMhaed and the calculation of logits. But I don't know how to change it .

AI678 commented 3 years ago

Yes , you are right. @yhznb

AI678 commented 3 years ago

Hey @yhznb,

We try to mainly use the github issues for bugs in the library. For more customized questions it would be great if you could use https://discuss.huggingface.co/ instead.

Regarding your question I would just add a layer to BertLMHeadModel wherever you want to and then build your EncoderDecoderModel from BertModel (encoder) & your use-case speciifc BertLMHeadModel (decoder).

Sorry, I misunderstood what you meant. This is a feature to be developed. So, how long can this feature be developed ? thank you for your response.

nlpLover123 commented 3 years ago

Hey , I have similar demands. Because I think using only vanilla bert2bert or roberta2roberta is not sufficient for abstractive summarization . For fluency and information richness, we should consider to change the top layer of decoder for further learning.

nlpLover123 commented 3 years ago

Hey, @patrickvonplaten, when do you want to release that ?

AI678 commented 3 years ago

@nlpLover123 , you can control it step by step. But I think it is too slow for a large dataset like cnn-dailymail. And I also want to ask when do you want to release that ? @patrickvonplaten If that needs too much time, maybe I would write a encoder_decoder_model from scratch. Because I have little time to wait for that. Thank you very much .

ghost commented 3 years ago

that is too difficult @AI678 .Maybe it is slower that step by step generation.

AI678 commented 3 years ago

so I just want to make a specific change at the LMhead layer @moonlightarc

patrickvonplaten commented 3 years ago

@AI678 , I don't think we are planning on releasing such a feature into the library. It's a very specific request and I'd suggest that you try to fork the repo and make the changes according to your needs