Closed gulnazaki closed 3 years ago
@patrickvonplaten any thoughts on this? Since, I found your work on Bert2Bert very informative :)
Hey @gulnazaki - you can use XLNet as an encoder, but not as a decoder because it'll be very difficult to add cross-attention functionality to XLNet for the decoder...
Thanks @patrickvonplaten , I thought so. Also, the concept of XLNet is kinda the opposite of uni-directional.
I will try to increase the sequence length of GPT2 for the output sequence.
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.
If you think this still needs to be addressed please comment on this thread.
I want to train a long sequence dataset (MIDI text event representation like the one in MuseNet) from scratch. Since, I can't split the sequence to "sentences" I am using XLNet (or Transformer-XL). I am modelling the task as a sequence2sequence task (with max input seq length of around 40k tokens and output length of 4k tokens) so I want to use an Encoder Decoder Framework.
Is it possible to use XLNet as the encoder and decoder, or just the encoder and use GPT-2 for example to do the decoding (because of the smaller output sequence length).
Thank you 🤗