lucidrains / routing-transformer

Fully featured implementation of Routing Transformer
MIT License
282 stars 29 forks source link

Long dependencies #11

Open matthew-jurewicz opened 4 years ago

matthew-jurewicz commented 4 years ago

I have a long piece of text where the end depends on the start. Should I pass the whole text in one batch for training? Also, does this affect generating the same thing?

tomweingarten commented 4 years ago

Can you give more details on what you're trying to do? What is the shape of your training data? How long is long? The readme suggests that you want to turn on random truncation when you're going to be generating long text, as the transformer doesn't generalize well to different text sizes. I'd highly recommend that.

matthew-jurewicz commented 4 years ago

I'd like something coherent with as many as 1 million tokens, and Google says Reformer can ingest something this long. Also, you're saying I can't just pad to maximum sequence length?