huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.32k stars 26.86k forks source link

Using the XLNet or Tranformer-XL as an EncoderDecoder #8778

Closed gulnazaki closed 3 years ago

gulnazaki commented 3 years ago

I want to train a long sequence dataset (MIDI text event representation like the one in MuseNet) from scratch. Since, I can't split the sequence to "sentences" I am using XLNet (or Transformer-XL). I am modelling the task as a sequence2sequence task (with max input seq length of around 40k tokens and output length of 4k tokens) so I want to use an Encoder Decoder Framework.

Is it possible to use XLNet as the encoder and decoder, or just the encoder and use GPT-2 for example to do the decoding (because of the smaller output sequence length).

Thank you 🤗

gulnazaki commented 3 years ago

@patrickvonplaten any thoughts on this? Since, I found your work on Bert2Bert very informative :)

patrickvonplaten commented 3 years ago

Hey @gulnazaki - you can use XLNet as an encoder, but not as a decoder because it'll be very difficult to add cross-attention functionality to XLNet for the decoder...

gulnazaki commented 3 years ago

Thanks @patrickvonplaten , I thought so. Also, the concept of XLNet is kinda the opposite of uni-directional.

I will try to increase the sequence length of GPT2 for the output sequence.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.